tustvold opened a new issue, #7053: URL: https://github.com/apache/datafusion/issues/7053
### Is your feature request related to a problem or challenge? Currently `SortExec::sort_batch_stream` uses `lexsort_to_indices` to sort the produced `RecordBatch`. For multi-column sorts this makes use of `LexicographicalComparator`. The branching and dynamic dispatch involved in this comparator is relatively expensive. Converting to the row format first, and comparing these rows has been found to offer significant performance advantages in similar applications - https://github.com/apache/arrow-datafusion/pull/3386. ### Describe the solution you'd like SortExec should: * If single sort column, use `sort_to_indices` to sort the input batches * If multiple columns, convert to the row format and sort using this representation * If performing a subsequent merge, preserve the row encoding to avoid redundant work ### Describe alternatives you've considered _No response_ ### Additional context This is likely not a good first issue, and I do not recommend people pick it up, creating primarily for tracking purposes. I will likely pick it up at some point in the near-ish future. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org