jorgecarleitao opened a new pull request #8165:
URL: https://github.com/apache/arrow/pull/8165
This PR speeds up some of the aggregations in arrow by 10-60% by simplifying
their logic and overall allowing the optimizer to do its work.
The first 3 commits (up to 29754d7) simply improve the benchmark itself by:
* not taking the creation of the arrays into account, only the computation,
* moving it to another file
* adding randomness to the data to reduce spurious results due to
speculative execution and others
* add case for data with nulls, since the kernels branch out on that
condition
The last 3 commits are the optimizations themselves.
```
sum 512 time: [535.66 ns 536.11 ns 536.57 ns]
change: [-58.421% -58.222% -57.957%] (p = 0.00 <
0.05)
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) high mild
6 (6.00%) high severe
min 512 time: [766.77 ns 775.85 ns 788.35 ns]
change: [-41.555% -41.017% -40.388%] (p = 0.00 <
0.05)
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) high mild
6 (6.00%) high severe
sum nulls 512 time: [1.0968 us 1.1000 us 1.1038 us]
change: [-8.9918% -7.6232% -5.7130%] (p = 0.00 <
0.05)
Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
3 (3.00%) high mild
12 (12.00%) high severe
min nulls 512 time: [1.3208 us 1.3242 us 1.3286 us]
change: [-11.028% -10.240% -9.4581%] (p = 0.00 <
0.05)
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
3 (3.00%) high mild
8 (8.00%) high severe
```
Command:
```
git checkout 29754d7 && cargo bench --bench aggregate_kernels && git
checkout agg_arrow && cargo bench --bench aggregate_kernels
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]