jhorstmann opened a new pull request, #7937:
URL: https://github.com/apache/arrow-rs/pull/7937

   # Which issue does this PR close?
   
   Optimize `partition_validity` function used in sort kernels
       
   - Preallocate vectors based on known null counts
   - Avoid dynamic dispatch by calling `NullBuffer::is_valid` instead of 
`Array::is_valid`
   - Avoid capacity checks inside loop by writing to `spare_capacity_mut` 
instead of using `push`
   - Closes #7936.
   
   # Rationale for this change
   
   Microbenchmark results for `sort_kernels` compared to `main`, only looking 
at benchmarks matching "nulls to indices":
   
   ```
   sort i32 nulls to indices 2^10
                           time:   [4.9325 µs 4.9370 µs 4.9422 µs]
                           change: [−20.303% −20.133% −19.974%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   sort i32 nulls to indices 2^12
                           time:   [20.096 µs 20.209 µs 20.327 µs]
                           change: [−26.819% −26.275% −25.697%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   sort f32 nulls to indices 2^12
                           time:   [26.329 µs 26.366 µs 26.406 µs]
                           change: [−29.487% −29.331% −29.146%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   sort string[0-10] nulls to indices 2^12
                           time:   [70.667 µs 70.762 µs 70.886 µs]
                           change: [−20.057% −19.935% −19.819%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   sort string[0-100] nulls to indices 2^12
                           time:   [101.98 µs 102.44 µs 102.99 µs]
                           change: [−0.3501% +0.0835% +0.4918%] (p = 0.71 > 
0.05)
                           No change in performance detected.
   
   sort string[0-400] nulls to indices 2^12
                           time:   [84.952 µs 85.024 µs 85.102 µs]
                           change: [−5.3969% −4.9827% −4.6421%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   sort string[10] nulls to indices 2^12
                           time:   [72.486 µs 72.664 µs 72.893 µs]
                           change: [−14.937% −14.781% −14.599%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   sort string[100] nulls to indices 2^12
                           time:   [71.354 µs 71.606 µs 71.902 µs]
                           change: [−17.207% −16.795% −16.373%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   sort string[1000] nulls to indices 2^12
                           time:   [73.088 µs 73.195 µs 73.311 µs]
                           change: [−16.705% −16.599% −16.483%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   sort string_view[10] nulls to indices 2^12
                           time:   [32.592 µs 32.654 µs 32.731 µs]
                           change: [−15.722% −15.512% −15.310%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   sort string_view[0-400] nulls to indices 2^12
                           time:   [32.981 µs 33.074 µs 33.189 µs]
                           change: [−25.570% −25.132% −24.700%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   sort string_view_inlined[0-12] nulls to indices 2^12
                           time:   [28.467 µs 28.496 µs 28.529 µs]
                           change: [−22.978% −22.786% −22.574%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   sort string[10] dict nulls to indices 2^12
                           time:   [94.463 µs 94.503 µs 94.542 µs]
                           change: [−11.386% −11.165% −10.961%] (p = 0.00 < 
0.05)
                           Performance has improved.
   ```
   
   # Are these changes tested?
   
   Covered by existing tests
   
   # Are there any user-facing changes?
   
   No, the method is internal to the sort kernels.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to