kou commented on pull request #8612:
URL: https://github.com/apache/arrow/pull/8612#issuecomment-730167841


   >     * sort(chunked array) sorts each chunk individually, then merges all 
of them (see `std::inplace_merge`)
   
   I've implemented.
   See also: https://github.com/apache/arrow/pull/8612#discussion_r526616431
   
   >     * sort(table) sorts each sort column in reverse order
   
   I wanted this a follow-up task but I've implemented this in this pull 
request. Because both of them mentions this approach.
   
   My implementation `TableRadixSorter` is naive but this approach is slower 
than the first sort key based approach:
   
   Summary:
   
     * The first sort key approach is 1.1x-27.2x faster than the radix sort 
approach except narrow range and 2 sort keys case.
   
     * The first sort key approach doesn't depend on the number of sort keys in 
wide range data.
   
     * The first sort key approach slowed down when the number of sort keys is 
increased in narrow range data.
   
     * The radix sort approach always slowed down when the number of sort keys 
is increased.
       * Because the radix sort approach always sorts all sort keys.
   
   ----
   
   Description:
   
   N records: 1048576
   
   Narrow range: Almost elements may be same
   
   Wide range: Almost elements will be different
   
   ----
   
   N chunks: 4
   
   Narrow range:
   
   N sort keys          | 2           | 8            | 16
   -------------------- | ----------- | ------------ | ------------
   Radix                | 318651531ns | 2353793768ns | 4627874931ns
   First sort key       | 397339109ns | 595568690ns  | 591632035ns
   Radix/First sort key | 0.8         | 3.9          | 7.8
   
   Wide range:
   
   N sort keys          | 2           | 8            | 16
   -------------------- | ----------- | ------------ | ------------
   Radix                | 593904971ns | 2989364923ns | 6101922415ns
   First sort key       | 220809969ns | 218214843ns  | 223708791ns
   Radix/First sort key | 2.6         | 13.6         | 27.2
   
   ----
   
   N chunks: 32
   
   Narrow range:
   
   N sort keys          | 2            | 8            | 16
   -------------------- | ------------ | ------------ | ------------
   Radix                | 1198835057ns | 6513599980ns | 13562068073ns
   First sort key       | 1035218085ns | 1373174328ns | 1350031141ns
   Radix/First sort key | 1.1          | 4.7          | 10.0
   
   Wide range:
   
   N sort keys          | 2            | 8            | 16
   -------------------- | ------------ | ------------ | ------------
   Radix                | 1585765386ns | 7305830182ns | 15323045709ns
   First sort key       | 592304437ns  | 589917731ns  | 619193843ns
   Radix/First sort key | 2.6          | 12.3         | 24.7
   
   ----
   
   Can we mark the radix sort approach improvement a follow-up task? Or should 
we optimize the radix sort approach in this pull request?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to