jorgecarleitao opened a new pull request #8165:
URL: https://github.com/apache/arrow/pull/8165


   This PR speeds up some of the aggregations in arrow by 10-60% by simplifying 
their logic and overall allowing the optimizer to do its work.
   
   The first 3 commits (up to 29754d7) simply improve the benchmark itself by:
   * not taking the creation of the arrays into account, only the computation, 
   * moving it to another file
   * adding randomness to the data to reduce spurious results due to 
speculative execution and others
   * add case for data with nulls, since the kernels branch out on that 
condition
   
   The last 3 commits are the optimizations themselves.
   
   ```
   sum 512                 time:   [535.66 ns 536.11 ns 536.57 ns]              
       
                           change: [-58.421% -58.222% -57.957%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 10 outliers among 100 measurements (10.00%)
     4 (4.00%) high mild
     6 (6.00%) high severe
   
   min 512                 time:   [766.77 ns 775.85 ns 788.35 ns]              
       
                           change: [-41.555% -41.017% -40.388%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 10 outliers among 100 measurements (10.00%)
     4 (4.00%) high mild
     6 (6.00%) high severe
   
   sum nulls 512           time:   [1.0968 us 1.1000 us 1.1038 us]              
             
                           change: [-8.9918% -7.6232% -5.7130%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 15 outliers among 100 measurements (15.00%)
     3 (3.00%) high mild
     12 (12.00%) high severe
   
   min nulls 512           time:   [1.3208 us 1.3242 us 1.3286 us]              
             
                           change: [-11.028% -10.240% -9.4581%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 11 outliers among 100 measurements (11.00%)
     3 (3.00%) high mild
     8 (8.00%) high severe
   ```
   
   Command:
   ```
   git checkout 29754d7 && cargo bench --bench aggregate_kernels && git 
checkout agg_arrow && cargo bench --bench aggregate_kernels
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to