jhorstmann opened a new issue #980:
URL: https://github.com/apache/arrow-rs/issues/980
**Is your feature request related to a problem or challenge? Please describe
what you are trying to do.**
There are two use cases for this feature:
- Some storage providers or engines are able to guarantee that dictionary
keys are already sorted and so sorting could be more efficient by using the
keys instead of looking up corresponding strings.
- For the PARTITION BY part of window functions the data does not have to be
sorted by the strings, sorting by the keys also ensures a partitioning
**Describe the solution you'd like**
Add a flat `assume_sorted_dictionary` to `SortOptions`. In `sort_to_indices`
this flags gets used in the branch for dictionary types and if it is set we
sort the keys as a primitive array. The same distinction also needs to be
implemented in `build_compare` for the `lexsort_to_indices` kernel.
**Additional context**
Once this is implemented, the window function logic in DataFusion could be
adjusted to take advantage of it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]