kou commented on pull request #8612: URL: https://github.com/apache/arrow/pull/8612#issuecomment-730167841
> * sort(chunked array) sorts each chunk individually, then merges all of them (see `std::inplace_merge`) I've implemented. See also: https://github.com/apache/arrow/pull/8612#discussion_r526616431 > * sort(table) sorts each sort column in reverse order I wanted this a follow-up task but I've implemented this in this pull request. Because both of them mentions this approach. My implementation `TableRadixSorter` is naive but this approach is slower than the first sort key based approach: Summary: * The first sort key approach is 1.1x-27.2x faster than the radix sort approach except narrow range and 2 sort keys case. * The first sort key approach doesn't depend on the number of sort keys in wide range data. * The first sort key approach slowed down when the number of sort keys is increased in narrow range data. * The radix sort approach always slowed down when the number of sort keys is increased. * Because the radix sort approach always sorts all sort keys. ---- Description: N records: 1048576 Narrow range: Almost elements may be same Wide range: Almost elements will be different ---- N chunks: 4 Narrow range: N sort keys | 2 | 8 | 16 -------------------- | ----------- | ------------ | ------------ Radix | 318651531ns | 2353793768ns | 4627874931ns First sort key | 397339109ns | 595568690ns | 591632035ns Radix/First sort key | 0.8 | 3.9 | 7.8 Wide range: N sort keys | 2 | 8 | 16 -------------------- | ----------- | ------------ | ------------ Radix | 593904971ns | 2989364923ns | 6101922415ns First sort key | 220809969ns | 218214843ns | 223708791ns Radix/First sort key | 2.6 | 13.6 | 27.2 ---- N chunks: 32 Narrow range: N sort keys | 2 | 8 | 16 -------------------- | ------------ | ------------ | ------------ Radix | 1198835057ns | 6513599980ns | 13562068073ns First sort key | 1035218085ns | 1373174328ns | 1350031141ns Radix/First sort key | 1.1 | 4.7 | 10.0 Wide range: N sort keys | 2 | 8 | 16 -------------------- | ------------ | ------------ | ------------ Radix | 1585765386ns | 7305830182ns | 15323045709ns First sort key | 592304437ns | 589917731ns | 619193843ns Radix/First sort key | 2.6 | 12.3 | 24.7 ---- Can we mark the radix sort approach improvement a follow-up task? Or should we optimize the radix sort approach in this pull request? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
