cyb70289 commented on pull request #8612: URL: https://github.com/apache/arrow/pull/8612#issuecomment-725205535
@kou I'm okay with this patch. As you listed in follow up tasks, sorting arrays separately and merging afterwards should be faster. And I think there are other chances to improve performance. Some random thoughts: - Looks you are returning a flat index array, does it make sense to return array of tuple (chunk_index, offset_in_chunk)? Maybe easier for client code to use? - For multi column sorting, in one iteration, current code compares values column by column till first non-equal found. I don't know if a radix sort approach is better, e.g. sort by 2nd-order column first, then sort by 1st-order column. It may be possible to leverage existing array based sorting code(counting sort, etc). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
