[PR] perf: optimise and bench first_value, last_value aggregate [datafusion]

via GitHub Sun, 05 Apr 2026 02:40:06 -0700


theirix opened a new pull request, #21383:
URL: https://github.com/apache/datafusion/pull/21383


   ## Which issue does this PR close?
   
   - Closes #.
   
   ## Rationale for this change
   
   A minor refactoring of `first_last.rs` to improve performance (up to 32%) 
and address naming concerns in TODOs.
   
   ## What changes are included in this PR?
   
   - Optimise memory allocation in `take_state` - no need to copy vectors and 
buffers
   - Optimise extracting single elements
   - Pre-compute common data for sorting
   - Rename structs and functions as recommended in TODOs - a majority of 
changes in this PR
   - Add benchmark. It's pretty complicated to test aggregates with grouping, 
since many operations are stateful, so I introduced end-to-end `evaluate` test 
(to actually test taking state) and `convert_to_state` (as in other benches)
   
   ## Are these changes tested?
   
   - Existing unit and integration tests
   - A new bench has a meaningful result
   
   Improvements: up to 32%
   
   Raw bench result:
   <details>
   first_value convert_to_state nulls=0%, filter=false
                           time:   [98.086 µs 99.014 µs 100.25 µs]
                           change: [−15.400% −13.532% −11.559%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 17 outliers among 100 measurements (17.00%)
     4 (4.00%) low mild
     3 (3.00%) high mild
     10 (10.00%) high severe
   
   Benchmarking first_value evaluate_bench nulls=0%, filter=false, first(2): 
Collecting 100 samples in estimated 7.1575 s (10k ite
   first_value evaluate_bench nulls=0%, filter=false, first(2)
                           time:   [54.938 µs 55.482 µs 56.060 µs]
                           change: [−38.163% −36.295% −34.463%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 9 outliers among 100 measurements (9.00%)
     5 (5.00%) high mild
     4 (4.00%) high severe
   
   Benchmarking first_value evaluate_bench nulls=0%, filter=false, all: 
Collecting 100 samples in estimated 7.0741 s (10k iteratio
   first_value evaluate_bench nulls=0%, filter=false, all
                           time:   [50.624 µs 51.092 µs 51.612 µs]
                           change: [−17.955% −16.332% −14.593%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 9 outliers among 100 measurements (9.00%)
     5 (5.00%) high mild
     4 (4.00%) high severe
   
   first_value convert_to_state nulls=0%, filter=true
                           time:   [2.0647 µs 2.0881 µs 2.1148 µs]
                           change: [−7.6314% −6.2174% −4.8530%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     2 (2.00%) high mild
     6 (6.00%) high severe
   
   Benchmarking first_value evaluate_bench nulls=0%, filter=true, first(2): 
Collecting 100 samples in estimated 9.6354 s (10k iter
   first_value evaluate_bench nulls=0%, filter=true, first(2)
                           time:   [54.708 µs 55.240 µs 55.805 µs]
                           change: [−27.249% −24.755% −22.195%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     3 (3.00%) high mild
     2 (2.00%) high severe
   
   Benchmarking first_value evaluate_bench nulls=0%, filter=true, all: 
Collecting 100 samples in estimated 9.6394 s (10k iteration
   first_value evaluate_bench nulls=0%, filter=true, all
                           time:   [50.540 µs 50.963 µs 51.424 µs]
                           change: [−6.8346% −3.9146% −0.8864%] (p = 0.01 < 
0.05)
                           Change within noise threshold.
   Found 4 outliers among 100 measurements (4.00%)
     1 (1.00%) high mild
     3 (3.00%) high severe
   
   Benchmarking first_value convert_to_state nulls=90%, filter=false: 
Collecting 100 samples in estimated 5.0440 s (50k iterations
   first_value convert_to_state nulls=90%, filter=false
                           time:   [98.054 µs 98.996 µs 100.10 µs]
                           change: [−3.4798% −2.2129% −1.0173%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 19 outliers among 100 measurements (19.00%)
     1 (1.00%) low severe
     8 (8.00%) low mild
     2 (2.00%) high mild
     8 (8.00%) high severe
   
   Benchmarking first_value evaluate_bench nulls=90%, filter=false, first(2): 
Collecting 100 samples in estimated 8.5385 s (10k it
   first_value evaluate_bench nulls=90%, filter=false, first(2)
                           time:   [53.780 µs 54.673 µs 55.639 µs]
                           change: [−17.702% −15.978% −14.173%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     1 (1.00%) low mild
     5 (5.00%) high mild
     2 (2.00%) high severe
   
   Benchmarking first_value evaluate_bench nulls=90%, filter=false, all: 
Collecting 100 samples in estimated 8.2692 s (10k iterati
   first_value evaluate_bench nulls=90%, filter=false, all
                           time:   [49.851 µs 50.289 µs 50.755 µs]
                           change: [−4.8554% −3.1896% −1.3951%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     3 (3.00%) low severe
     2 (2.00%) high mild
     3 (3.00%) high severe
   
   Benchmarking first_value convert_to_state nulls=90%, filter=true: Collecting 
100 samples in estimated 5.0077 s (2.4M iterations
   first_value convert_to_state nulls=90%, filter=true
                           time:   [2.0339 µs 2.0465 µs 2.0603 µs]
                           change: [−1.8037% −0.7068% +0.3821%] (p = 0.22 > 
0.05)
                           No change in performance detected.
   Found 8 outliers among 100 measurements (8.00%)
     2 (2.00%) low mild
     3 (3.00%) high mild
     3 (3.00%) high severe
   
   Benchmarking first_value evaluate_bench nulls=90%, filter=true, first(2): 
Collecting 100 samples in estimated 9.7925 s (10k ite
   first_value evaluate_bench nulls=90%, filter=true, first(2)
                           time:   [54.544 µs 55.119 µs 55.720 µs]
                           change: [−15.717% −13.982% −12.279%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) low mild
     1 (1.00%) high severe
   
   Benchmarking first_value evaluate_bench nulls=90%, filter=true, all: 
Collecting 100 samples in estimated 9.7401 s (10k iteratio
   first_value evaluate_bench nulls=90%, filter=true, all
                           time:   [50.126 µs 50.886 µs 51.703 µs]
                           change: [+0.8379% +4.1713% +7.1024%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   Found 9 outliers among 100 measurements (9.00%)
     6 (6.00%) high mild
     3 (3.00%) high severe
   
   last_value convert_to_state nulls=0%, filter=false
                           time:   [97.957 µs 98.314 µs 98.692 µs]
                           change: [−2.8086% −2.0315% −1.2541%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 10 outliers among 100 measurements (10.00%)
     7 (7.00%) high mild
     3 (3.00%) high severe
   
   Benchmarking last_value evaluate_bench nulls=0%, filter=false, first(2): 
Collecting 100 samples in estimated 7.0582 s (10k iter
   last_value evaluate_bench nulls=0%, filter=false, first(2)
                           time:   [52.692 µs 53.414 µs 54.144 µs]
                           change: [−22.228% −20.636% −19.153%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     1 (1.00%) low mild
     2 (2.00%) high mild
     2 (2.00%) high severe
   
   Benchmarking last_value evaluate_bench nulls=0%, filter=false, all: 
Collecting 100 samples in estimated 6.9411 s (10k iteration
   last_value evaluate_bench nulls=0%, filter=false, all
                           time:   [49.781 µs 50.226 µs 50.793 µs]
                           change: [−1.9658% −0.1634% +1.5825%] (p = 0.86 > 
0.05)
                           No change in performance detected.
   Found 11 outliers among 100 measurements (11.00%)
     1 (1.00%) low mild
     6 (6.00%) high mild
     4 (4.00%) high severe
   
   last_value convert_to_state nulls=0%, filter=true
                           time:   [2.0639 µs 2.0781 µs 2.0949 µs]
                           change: [−0.7535% +0.4491% +1.6752%] (p = 0.47 > 
0.05)
                           No change in performance detected.
   Found 11 outliers among 100 measurements (11.00%)
     3 (3.00%) low mild
     4 (4.00%) high mild
     4 (4.00%) high severe
   
   Benchmarking last_value evaluate_bench nulls=0%, filter=true, first(2): 
Collecting 100 samples in estimated 9.8040 s (10k itera
   last_value evaluate_bench nulls=0%, filter=true, first(2)
                           time:   [53.779 µs 54.311 µs 54.868 µs]
                           change: [−15.863% −14.071% −12.391%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 7 outliers among 100 measurements (7.00%)
     4 (4.00%) high mild
     3 (3.00%) high severe
   
   Benchmarking last_value evaluate_bench nulls=0%, filter=true, all: 
Collecting 100 samples in estimated 9.6860 s (10k iterations
   last_value evaluate_bench nulls=0%, filter=true, all
                           time:   [50.276 µs 50.794 µs 51.429 µs]
                           change: [−1.6780% +0.0697% +1.9541%] (p = 0.94 > 
0.05)
                           No change in performance detected.
   Found 6 outliers among 100 measurements (6.00%)
     2 (2.00%) high mild
     4 (4.00%) high severe
   
   last_value convert_to_state nulls=90%, filter=false
                           time:   [97.508 µs 98.412 µs 99.486 µs]
                           change: [−1.2994% −0.3404% +0.8319%] (p = 0.52 > 
0.05)
                           No change in performance detected.
   Found 9 outliers among 100 measurements (9.00%)
     2 (2.00%) low mild
     3 (3.00%) high mild
     4 (4.00%) high severe
   
   Benchmarking last_value evaluate_bench nulls=90%, filter=false, first(2): 
Collecting 100 samples in estimated 8.9282 s (10k ite
   last_value evaluate_bench nulls=90%, filter=false, first(2)
                           time:   [54.064 µs 54.748 µs 55.433 µs]
                           change: [−15.790% −13.678% −11.790%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     3 (3.00%) low mild
     1 (1.00%) high mild
     1 (1.00%) high severe
   
   Benchmarking last_value evaluate_bench nulls=90%, filter=false, all: 
Collecting 100 samples in estimated 9.2411 s (10k iteratio
   last_value evaluate_bench nulls=90%, filter=false, all
                           time:   [49.964 µs 50.731 µs 51.630 µs]
                           change: [−2.4530% −0.2470% +1.8407%] (p = 0.82 > 
0.05)
                           No change in performance detected.
   Found 9 outliers among 100 measurements (9.00%)
     8 (8.00%) high mild
     1 (1.00%) high severe
   
   last_value convert_to_state nulls=90%, filter=true
                           time:   [2.0660 µs 2.0874 µs 2.1139 µs]
                           change: [−3.2299% −1.8585% −0.5850%] (p = 0.01 < 
0.05)
                           Change within noise threshold.
   Found 10 outliers among 100 measurements (10.00%)
     4 (4.00%) high mild
     6 (6.00%) high severe
   
   Benchmarking last_value evaluate_bench nulls=90%, filter=true, first(2): 
Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 5.6s, enable flat sampling, or reduce sample count to 60.
   Benchmarking last_value evaluate_bench nulls=90%, filter=true, first(2): 
Collecting 100 samples in estimated 5.6274 s (5050 ite
   last_value evaluate_bench nulls=90%, filter=true, first(2)
                           time:   [53.565 µs 54.399 µs 55.381 µs]
                           change: [−18.757% −16.680% −14.334%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     8 (8.00%) high mild
   
   Benchmarking last_value evaluate_bench nulls=90%, filter=true, all: Warming 
up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 5.3s, enable flat sampling, or reduce sample count to 60.
   Benchmarking last_value evaluate_bench nulls=90%, filter=true, all: 
Collecting 100 samples in estimated 5.3402 s (5050 iteratio
   last_value evaluate_bench nulls=90%, filter=true, all
                           time:   [49.473 µs 50.055 µs 50.743 µs]
                           change: [−41.448% −32.836% −24.471%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 9 outliers among 100 measurements (9.00%)
     5 (5.00%) high mild
     4 (4.00%) high severe
   
   cargo bench --bench first_last -- --baseline main-first_last3  810.16s user 
14.72s system 135% cpu 10:09.30 total
   </details>
   
   ## Are there any user-facing changes?
   
   <!--
   If there are user-facing changes then we may require documentation to be 
updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api 
change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] perf: optimise and bench first_value, last_value aggregate [datafusion]

Reply via email to