yjshen commented on issue #2557:
URL:
https://github.com/apache/arrow-datafusion/issues/2557#issuecomment-1131011698
`group_indices` takes adjacent records in the same batch as input, and does
best-effort grouping to make slices instead of individual positions for better
`extend` performance (extend a range of records rather than individual record
at a time).
`sort_unstable` is useful since the positions are generated from `lexsort`
which is unstable itself. there would be a possibility that records with the
same sort key appear randomly after `lexsort`. but `extend` takes start pos
and length as input, so a sort to make records with the same sort key appears
sequentially is needed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]