tustvold opened a new issue, #5230: URL: https://github.com/apache/arrow-datafusion/issues/5230
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** `SortPreservingMerge` now makes use of the [arrow row format](https://docs.rs/arrow-row/latest/arrow_row/) and this has yielded significant performance improvements over the prior DynComparator based approach. We can likely signifcantly improve the performance of `SortExec` by modifying `sort_batch` to make use of the row format when performing multi-column sorts instead of `lexsort_to_indices`, which internally uses DynComparator. For single-column sorts `lexsort_to_indices` will call through to `sort_to_indices` which will be faster than the row format, we should make sure to keep this special case. **Describe the solution you'd like** A first iteration could simply modify `sort_batch` to use the row format for multi-column sorts, as demonstrated [here](https://docs.rs/arrow-row/latest/arrow_row/#lexsort), falling back to `sort_to_indices` if only a single column. A second iteration could then look to find a way to convert to the row format once, and preserve this encoding when feeding sorted batches into `SortPreservingMerge`. **Describe alternatives you've considered** We could not do this **Additional context** FYI @alamb @mustafasrepo @ozankabak -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
