[PR] arrow-select: improve dictionary interleave fallback performance [arrow-rs]

via GitHub Wed, 10 Dec 2025 04:35:18 -0800


asubiotto opened a new pull request, #8978:
URL: https://github.com/apache/arrow-rs/pull/8978


   The naive interleave_fallback would use MutableArray and extend it with the 
full values slice each time the target array changed in the indices slice.
   
   This commit introduces a new approach where dictionary values are 
concatenated once and then new offsets are computed over these taking the 
indices into account. This results in 50-75% performance improvement in 
microbenchmarks and will also improve memory usage during interleaves (used 
heavily in sorts).
   
   Note that this path is only taken when should_merge_dictionary_values 
returns false.
   
   ```
   $ cargo bench --bench interleave_kernels -- 'dict' --baseline main
   interleave dict(20, 0.0) 100 [0..100, 100..230, 450..1000]
                           time:   [627.14 ns 634.76 ns 644.13 ns]
                           change: [−65.614% −65.345% −65.002%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 9 outliers among 100 measurements (9.00%)
     2 (2.00%) low mild
     6 (6.00%) high mild
     1 (1.00%) high severe
   
   interleave dict(20, 0.0) 400 [0..100, 100..230, 450..1000]
                           time:   [934.35 ns 937.51 ns 940.60 ns]
                           change: [−71.488% −71.340% −71.208%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   
   interleave dict(20, 0.0) 1024 [0..100, 100..230, 450..1000]
                           time:   [1.6485 µs 1.6528 µs 1.6566 µs]
                           change: [−74.307% −74.190% −74.088%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   interleave dict(20, 0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]
                           time:   [1.6723 µs 1.6782 µs 1.6842 µs]
                           change: [−74.664% −74.544% −74.438%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) low mild
     1 (1.00%) high severe
   
   interleave dict_sparse(20, 0.0) 100 [0..100, 100..230, 450..1000]
                           time:   [1.5985 µs 1.6064 µs 1.6148 µs]
                           change: [−12.510% −12.116% −11.715%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 19 outliers among 100 measurements (19.00%)
     10 (10.00%) low mild
     6 (6.00%) high mild
     3 (3.00%) high severe
   
   interleave dict_sparse(20, 0.0) 400 [0..100, 100..230, 450..1000]
                           time:   [1.9310 µs 1.9466 µs 1.9680 µs]
                           change: [−41.488% −41.091% −40.628%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 15 outliers among 100 measurements (15.00%)
     3 (3.00%) low mild
     6 (6.00%) high mild
     6 (6.00%) high severe
   
   interleave dict_sparse(20, 0.0) 1024 [0..100, 100..230, 450..1000]
                           time:   [2.7812 µs 2.8516 µs 2.9401 µs]
                           change: [−56.097% −55.276% −54.274%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 15 outliers among 100 measurements (15.00%)
     8 (8.00%) high mild
     7 (7.00%) high severe
   
   interleave dict_sparse(20, 0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]
                           time:   [3.4926 µs 3.6558 µs 3.8427 µs]
                           change: [−48.423% −46.405% −44.379%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   interleave dict_distinct 100
                           time:   [2.0013 µs 2.0106 µs 2.0205 µs]
                           change: [−1.6162% −1.0465% −0.4647%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   Found 4 outliers among 100 measurements (4.00%)
     4 (4.00%) high mild
   
   interleave dict_distinct 1024
                           time:   [1.9784 µs 1.9855 µs 1.9924 µs]
                           change: [−2.4655% −1.8461% −1.2265%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   interleave dict_distinct 2048
                           time:   [1.9832 µs 1.9959 µs 2.0087 µs]
                           change: [−2.9917% −2.3003% −1.6062%] (p = 0.00 < 
0.05)
                           Performance has improved.
   ```
   
   # Which issue does this PR close?
   
   Specific performance improvement, I believe issue is redundant.
   
   # Rationale for this change
   
   See commit message/PR description.
   
   # What changes are included in this PR?
   
   Ditto.
   
   # Are these changes tested?
   
   This PR adds an additional test for interleave_fallback dictionaries with 
nulls in addition to the existing tests.
   
   # Are there any user-facing changes?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] arrow-select: improve dictionary interleave fallback performance [arrow-rs]

Reply via email to