kou commented on pull request #8612:
URL: https://github.com/apache/arrow/pull/8612#issuecomment-725225424


   Thanks for your comments!
   
   > Looks you are returning a flat index array, does it make sense to return 
array of tuple (chunk_index, offset_in_chunk)? Maybe easier for client code to 
use?
   
   Each column in table may have the different number of chunks. For example, 
`{"a": [[1, 2], [3], [4, 5, 6]], "b": [[1, 2, 3], [4, 5, 6]]}`. If we want to 
return `chunk_index` and `offset_in_chunk`, we need to return them for each 
column. It may decrease performance.
   
   > For multi column sorting, in one iteration, current code compares values 
column by column till first non-equal found. I don't know if a radix sort 
approach is better, e.g. sort by 2nd-order column first, then sort by 1st-order 
column. It may be possible to leverage existing array based sorting 
code(counting sort, etc).
   
   It's interesting! I've added the idea to follow-up tasks in the pull request 
description.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to