alamb commented on PR #5100:
URL: https://github.com/apache/arrow-rs/pull/5100#issuecomment-1827933052
Here is my performance results
Machine:
```
Model Name: MacBook Pro
Model Identifier: Mac15,9
Model Number: Z1AH000VNLL/A
Chip: Apple M3 Max
Total Number of Cores: 16 (12 performance and 4 efficiency)
Memory: 64 GB
```
## `master` @ 61da64a0557c80af5bb43b5f15c6d8bb6a314cb2 with `simd` vs branch
(both with nightly Rust)
<details><summary>Details</summary>
<p>
```
Running benches/aggregate_kernels.rs
(target/release/deps/aggregate_kernels-0072a695b99ab014)
Benchmarking float32/sum nonnull
Benchmarking float32/sum nonnull: Warming up for 3.0000 s
Benchmarking float32/sum nonnull: Collecting 100 samples in estimated 5.0218
s (773k iterations)
Benchmarking float32/sum nonnull: Analyzing
float32/sum nonnull time: [6.4869 µs 6.4915 µs 6.4973 µs]
thrpt: [37.576 GiB/s 37.609 GiB/s 37.636 GiB/s]
change:
time: [+112.84% +113.46% +114.05%] (p = 0.00 <
0.05)
thrpt: [-53.282% -53.152% -53.016%]
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) high mild
4 (4.00%) high severe
Benchmarking float32/min nonnull
Benchmarking float32/min nonnull: Warming up for 3.0000 s
Benchmarking float32/min nonnull: Collecting 100 samples in estimated 5.0622
s (232k iterations)
Benchmarking float32/min nonnull: Analyzing
float32/min nonnull time: [21.741 µs 21.756 µs 21.772 µs]
thrpt: [11.213 GiB/s 11.222 GiB/s 11.229 GiB/s]
change:
time: [+121.43% +122.23% +122.94%] (p = 0.00 <
0.05)
thrpt: [-55.146% -55.002% -54.839%]
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
6 (6.00%) high mild
4 (4.00%) high severe
Benchmarking float32/max nonnull
Benchmarking float32/max nonnull: Warming up for 3.0000 s
Benchmarking float32/max nonnull: Collecting 100 samples in estimated 5.0525
s (232k iterations)
Benchmarking float32/max nonnull: Analyzing
float32/max nonnull time: [21.489 µs 21.530 µs 21.578 µs]
thrpt: [11.315 GiB/s 11.340 GiB/s 11.361 GiB/s]
change:
time: [+216.76% +218.04% +219.36%] (p = 0.00 <
0.05)
thrpt: [-68.687% -68.557% -68.431%]
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low severe
2 (2.00%) high mild
1 (1.00%) high severe
Benchmarking float32/sum nullable
Benchmarking float32/sum nullable: Warming up for 3.0000 s
Benchmarking float32/sum nullable: Collecting 100 samples in estimated
5.0088 s (470k iterations)
Benchmarking float32/sum nullable: Analyzing
float32/sum nullable time: [10.645 µs 10.654 µs 10.663 µs]
thrpt: [22.897 GiB/s 22.916 GiB/s 22.935 GiB/s]
change:
time: [+129.35% +129.98% +130.61%] (p = 0.00 <
0.05)
thrpt: [-56.636% -56.517% -56.399%]
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) low mild
5 (5.00%) high mild
5 (5.00%) high severe
Benchmarking float32/min nullable
Benchmarking float32/min nullable: Warming up for 3.0000 s
Benchmarking float32/min nullable: Collecting 100 samples in estimated
5.1691 s (106k iterations)
Benchmarking float32/min nullable: Analyzing
float32/min nullable time: [48.697 µs 48.749 µs 48.809 µs]
thrpt: [5.0019 GiB/s 5.0081 GiB/s 5.0135 GiB/s]
change:
time: [+82.253% +83.059% +83.836%] (p = 0.00 <
0.05)
thrpt: [-45.604% -45.373% -45.131%]
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
4 (4.00%) high mild
7 (7.00%) high severe
Benchmarking float32/max nullable
Benchmarking float32/max nullable: Warming up for 3.0000 s
Benchmarking float32/max nullable: Collecting 100 samples in estimated
5.1709 s (106k iterations)
Benchmarking float32/max nullable: Analyzing
float32/max nullable time: [48.719 µs 48.793 µs 48.884 µs]
thrpt: [4.9943 GiB/s 5.0036 GiB/s 5.0112 GiB/s]
change:
time: [+102.72% +104.47% +106.07%] (p = 0.00 <
0.05)
thrpt: [-51.473% -51.094% -50.670%]
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
6 (6.00%) high mild
10 (10.00%) high severe
Benchmarking float64/sum nonnull
Benchmarking float64/sum nonnull: Warming up for 3.0000 s
Benchmarking float64/sum nonnull: Collecting 100 samples in estimated 5.0318
s (429k iterations)
Benchmarking float64/sum nonnull: Analyzing
float64/sum nonnull time: [11.717 µs 11.748 µs 11.777 µs]
thrpt: [41.462 GiB/s 41.562 GiB/s 41.674 GiB/s]
change:
time: [+96.573% +97.450% +98.222%] (p = 0.00 <
0.05)
thrpt: [-49.552% -49.354% -49.128%]
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe
Benchmarking float64/min nonnull
Benchmarking float64/min nonnull: Warming up for 3.0000 s
Benchmarking float64/min nonnull: Collecting 100 samples in estimated 5.0319
s (86k iterations)
Benchmarking float64/min nonnull: Analyzing
float64/min nonnull time: [57.630 µs 57.765 µs 57.921 µs]
thrpt: [8.4301 GiB/s 8.4530 GiB/s 8.4727 GiB/s]
change:
time: [+196.14% +197.63% +199.35%] (p = 0.00 <
0.05)
thrpt: [-66.595% -66.402% -66.232%]
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe
Benchmarking float64/max nonnull
Benchmarking float64/max nonnull: Warming up for 3.0000 s
Benchmarking float64/max nonnull: Collecting 100 samples in estimated 5.0342
s (121k iterations)
Benchmarking float64/max nonnull: Analyzing
float64/max nonnull time: [41.669 µs 41.851 µs 42.026 µs]
thrpt: [11.619 GiB/s 11.667 GiB/s 11.718 GiB/s]
change:
time: [+204.20% +205.40% +206.66%] (p = 0.00 <
0.05)
thrpt: [-67.390% -67.256% -67.127%]
Performance has regressed.
Found 17 outliers among 100 measurements (17.00%)
6 (6.00%) low severe
3 (3.00%) low mild
2 (2.00%) high mild
6 (6.00%) high severe
Benchmarking float64/sum nullable
Benchmarking float64/sum nullable: Warming up for 3.0000 s
Benchmarking float64/sum nullable: Collecting 100 samples in estimated
5.0564 s (227k iterations)
Benchmarking float64/sum nullable: Analyzing
float64/sum nullable time: [22.209 µs 22.224 µs 22.241 µs]
thrpt: [21.954 GiB/s 21.971 GiB/s 21.985 GiB/s]
change:
time: [+138.22% +139.19% +140.17%] (p = 0.00 <
0.05)
thrpt: [-58.363% -58.192% -58.021%]
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
3 (3.00%) low severe
4 (4.00%) high mild
4 (4.00%) high severe
Benchmarking float64/min nullable
Benchmarking float64/min nullable: Warming up for 3.0000 s
Benchmarking float64/min nullable: Collecting 100 samples in estimated
5.4217 s (56k iterations)
Benchmarking float64/min nullable: Analyzing
float64/min nullable time: [97.439 µs 97.559 µs 97.693 µs]
thrpt: [4.9981 GiB/s 5.0050 GiB/s 5.0111 GiB/s]
change:
time: [+158.93% +160.03% +161.18%] (p = 0.00 <
0.05)
thrpt: [-61.712% -61.543% -61.380%]
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) high mild
4 (4.00%) high severe
Benchmarking float64/max nullable
Benchmarking float64/max nullable: Warming up for 3.0000 s
Benchmarking float64/max nullable: Collecting 100 samples in estimated
5.4214 s (56k iterations)
Benchmarking float64/max nullable: Analyzing
float64/max nullable time: [97.401 µs 97.493 µs 97.602 µs]
thrpt: [5.0028 GiB/s 5.0083 GiB/s 5.0131 GiB/s]
change:
time: [+202.23% +203.27% +204.73%] (p = 0.00 <
0.05)
thrpt: [-67.184% -67.026% -66.913%]
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) high mild
5 (5.00%) high severe
Benchmarking int8/sum nonnull
Benchmarking int8/sum nonnull: Warming up for 3.0000 s
Benchmarking int8/sum nonnull: Collecting 100 samples in estimated 5.0023 s
(9.3M iterations)
Benchmarking int8/sum nonnull: Analyzing
int8/sum nonnull time: [536.05 ns 536.60 ns 537.27 ns]
thrpt: [113.60 GiB/s 113.74 GiB/s 113.86 GiB/s]
change:
time: [-1.3393% -0.9518% -0.5531%] (p = 0.00 <
0.05)
thrpt: [+0.5561% +0.9609% +1.3575%]
Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
Benchmarking int8/min nonnull
Benchmarking int8/min nonnull: Warming up for 3.0000 s
Benchmarking int8/min nonnull: Collecting 100 samples in estimated 5.0002 s
(9.3M iterations)
Benchmarking int8/min nonnull: Analyzing
int8/min nonnull time: [535.70 ns 536.25 ns 536.80 ns]
thrpt: [113.70 GiB/s 113.82 GiB/s 113.94 GiB/s]
change:
time: [-98.979% -98.976% -98.973%] (p = 0.00 <
0.05)
thrpt: [+9633.9% +9662.2% +9693.3%]
Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
1 (1.00%) low mild
7 (7.00%) high mild
5 (5.00%) high severe
Benchmarking int8/max nonnull
Benchmarking int8/max nonnull: Warming up for 3.0000 s
Benchmarking int8/max nonnull: Collecting 100 samples in estimated 5.0007 s
(9.3M iterations)
Benchmarking int8/max nonnull: Analyzing
int8/max nonnull time: [535.67 ns 536.06 ns 536.49 ns]
thrpt: [113.77 GiB/s 113.86 GiB/s 113.94 GiB/s]
change:
time: [-98.965% -98.962% -98.959%] (p = 0.00 <
0.05)
thrpt: [+9503.6% +9532.7% +9563.0%]
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low severe
5 (5.00%) high mild
4 (4.00%) high severe
Benchmarking int8/sum nullable
Benchmarking int8/sum nullable: Warming up for 3.0000 s
Benchmarking int8/sum nullable: Collecting 100 samples in estimated 5.0232 s
(707k iterations)
Benchmarking int8/sum nullable: Analyzing
int8/sum nullable time: [7.0953 µs 7.1011 µs 7.1070 µs]
thrpt: [8.5881 GiB/s 8.5952 GiB/s 8.6022 GiB/s]
change:
time: [+87.353% +88.096% +88.866%] (p = 0.00 <
0.05)
thrpt: [-47.052% -46.836% -46.625%]
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
1 (1.00%) low mild
6 (6.00%) high mild
7 (7.00%) high severe
Benchmarking int8/min nullable
Benchmarking int8/min nullable: Warming up for 3.0000 s
Benchmarking int8/min nullable: Collecting 100 samples in estimated 5.0079 s
(631k iterations)
Benchmarking int8/min nullable: Analyzing
int8/min nullable time: [7.9245 µs 7.9300 µs 7.9357 µs]
thrpt: [7.6912 GiB/s 7.6968 GiB/s 7.7021 GiB/s]
change:
time: [-79.180% -79.104% -79.028%] (p = 0.00 <
0.05)
thrpt: [+376.83% +378.56% +380.30%]
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low severe
1 (1.00%) low mild
7 (7.00%) high mild
3 (3.00%) high severe
Benchmarking int8/max nullable
Benchmarking int8/max nullable: Warming up for 3.0000 s
Benchmarking int8/max nullable: Collecting 100 samples in estimated 5.0048 s
(631k iterations)
Benchmarking int8/max nullable: Analyzing
int8/max nullable time: [7.9373 µs 7.9456 µs 7.9539 µs]
thrpt: [7.6736 GiB/s 7.6816 GiB/s 7.6897 GiB/s]
change:
time: [-79.127% -79.063% -79.000%] (p = 0.00 <
0.05)
thrpt: [+376.18% +377.62% +379.10%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low mild
5 (5.00%) high mild
2 (2.00%) high severe
Benchmarking int16/sum nonnull
Benchmarking int16/sum nonnull: Warming up for 3.0000 s
Benchmarking int16/sum nonnull: Collecting 100 samples in estimated 5.0033 s
(4.6M iterations)
Benchmarking int16/sum nonnull: Analyzing
int16/sum nonnull time: [1.3119 µs 1.3372 µs 1.3573 µs]
thrpt: [89.937 GiB/s 91.286 GiB/s 93.047 GiB/s]
change:
time: [+8.5860% +10.958% +13.451%] (p = 0.00 <
0.05)
thrpt: [-11.856% -9.8760% -7.9071%]
Performance has regressed.
Benchmarking int16/min nonnull
Benchmarking int16/min nonnull: Warming up for 3.0000 s
Benchmarking int16/min nonnull: Collecting 100 samples in estimated 5.0022 s
(3.3M iterations)
Benchmarking int16/min nonnull: Analyzing
int16/min nonnull time: [1.3759 µs 1.3845 µs 1.3915 µs]
thrpt: [87.725 GiB/s 88.170 GiB/s 88.721 GiB/s]
change:
time: [+17.717% +19.400% +20.696%] (p = 0.00 <
0.05)
thrpt: [-17.147% -16.248% -15.050%]
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
11 (11.00%) low severe
1 (1.00%) low mild
1 (1.00%) high mild
3 (3.00%) high severe
Benchmarking int16/max nonnull
Benchmarking int16/max nonnull: Warming up for 3.0000 s
Benchmarking int16/max nonnull: Collecting 100 samples in estimated 5.0056 s
(3.6M iterations)
Benchmarking int16/max nonnull: Analyzing
int16/max nonnull time: [1.3821 µs 1.3867 µs 1.3905 µs]
thrpt: [87.786 GiB/s 88.031 GiB/s 88.324 GiB/s]
change:
time: [+18.013% +19.315% +20.517%] (p = 0.00 <
0.05)
thrpt: [-17.024% -16.188% -15.263%]
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
9 (9.00%) low severe
2 (2.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe
Benchmarking int16/sum nullable
Benchmarking int16/sum nullable: Warming up for 3.0000 s
Benchmarking int16/sum nullable: Collecting 100 samples in estimated 5.0016
s (510k iterations)
Benchmarking int16/sum nullable: Analyzing
int16/sum nullable time: [9.6419 µs 9.7203 µs 9.7834 µs]
thrpt: [12.477 GiB/s 12.558 GiB/s 12.660 GiB/s]
change:
time: [+129.75% +132.40% +134.85%] (p = 0.00 <
0.05)
thrpt: [-57.419% -56.971% -56.475%]
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
12 (12.00%) low severe
Benchmarking int16/min nullable
Benchmarking int16/min nullable: Warming up for 3.0000 s
Benchmarking int16/min nullable: Collecting 100 samples in estimated 5.0587
s (303k iterations)
Benchmarking int16/min nullable: Analyzing
int16/min nullable time: [16.504 µs 16.619 µs 16.709 µs]
thrpt: [7.3057 GiB/s 7.3451 GiB/s 7.3963 GiB/s]
change:
time: [+138.61% +139.53% +140.32%] (p = 0.00 <
0.05)
thrpt: [-58.389% -58.252% -58.090%]
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
5 (5.00%) low severe
1 (1.00%) low mild
Benchmarking int16/max nullable
Benchmarking int16/max nullable: Warming up for 3.0000 s
Benchmarking int16/max nullable: Collecting 100 samples in estimated 5.0505
s (303k iterations)
Benchmarking int16/max nullable: Analyzing
int16/max nullable time: [16.333 µs 18.991 µs 24.750 µs]
thrpt: [4.9321 GiB/s 6.4277 GiB/s 7.4736 GiB/s]
change:
time: [+136.19% +155.80% +193.27%] (p = 0.00 <
0.05)
thrpt: [-65.902% -60.907% -57.661%]
Performance has regressed.
Found 19 outliers among 100 measurements (19.00%)
11 (11.00%) low severe
5 (5.00%) low mild
3 (3.00%) high severe
Benchmarking int32/sum nonnull
Benchmarking int32/sum nonnull: Warming up for 3.0000 s
Benchmarking int32/sum nonnull: Collecting 100 samples in estimated 5.0076 s
(1.6M iterations)
Benchmarking int32/sum nonnull: Analyzing
int32/sum nonnull time: [3.1533 µs 3.1678 µs 3.1781 µs]
thrpt: [76.819 GiB/s 77.069 GiB/s 77.424 GiB/s]
change:
time: [+29.275% +30.126% +30.900%] (p = 0.00 <
0.05)
thrpt: [-23.606% -23.152% -22.646%]
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) low severe
2 (2.00%) low mild
Benchmarking int32/min nonnull
Benchmarking int32/min nonnull: Warming up for 3.0000 s
Benchmarking int32/min nonnull: Collecting 100 samples in estimated 5.0153 s
(1.6M iterations)
Benchmarking int32/min nonnull: Analyzing
int32/min nonnull time: [3.1498 µs 3.1651 µs 3.1780 µs]
thrpt: [76.822 GiB/s 77.134 GiB/s 77.510 GiB/s]
change:
time: [+29.048% +29.399% +29.750%] (p = 0.00 <
0.05)
thrpt: [-22.929% -22.720% -22.509%]
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) low severe
Benchmarking int32/max nonnull
Benchmarking int32/max nonnull: Warming up for 3.0000 s
Benchmarking int32/max nonnull: Collecting 100 samples in estimated 5.0029 s
(1.6M iterations)
Benchmarking int32/max nonnull: Analyzing
int32/max nonnull time: [3.1712 µs 3.1782 µs 3.1846 µs]
thrpt: [76.662 GiB/s 76.816 GiB/s 76.987 GiB/s]
change:
time: [+29.378% +29.692% +29.959%] (p = 0.00 <
0.05)
thrpt: [-23.053% -22.894% -22.707%]
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild
Benchmarking int32/sum nullable
Benchmarking int32/sum nullable: Warming up for 3.0000 s
Benchmarking int32/sum nullable: Collecting 100 samples in estimated 5.0491
s (444k iterations)
Benchmarking int32/sum nullable: Analyzing
int32/sum nullable time: [11.295 µs 11.343 µs 11.383 µs]
thrpt: [21.448 GiB/s 21.523 GiB/s 21.615 GiB/s]
change:
time: [+142.74% +144.27% +145.38%] (p = 0.00 <
0.05)
thrpt: [-59.246% -59.062% -58.803%]
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
6 (6.00%) low severe
Benchmarking int32/min nullable
Benchmarking int32/min nullable: Warming up for 3.0000 s
Benchmarking int32/min nullable: Collecting 100 samples in estimated 5.1615
s (136k iterations)
Benchmarking int32/min nullable: Analyzing
int32/min nullable time: [37.962 µs 38.004 µs 38.045 µs]
thrpt: [6.4172 GiB/s 6.4240 GiB/s 6.4312 GiB/s]
change:
time: [+43.232% +44.819% +46.001%] (p = 0.00 <
0.05)
thrpt: [-31.507% -30.948% -30.183%]
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
5 (5.00%) low severe
2 (2.00%) high severe
Benchmarking int32/max nullable
Benchmarking int32/max nullable: Warming up for 3.0000 s
Benchmarking int32/max nullable: Collecting 100 samples in estimated 5.1674
s (136k iterations)
Benchmarking int32/max nullable: Analyzing
int32/max nullable time: [37.742 µs 37.873 µs 37.973 µs]
thrpt: [6.4293 GiB/s 6.4463 GiB/s 6.4686 GiB/s]
change:
time: [+44.745% +45.470% +46.065%] (p = 0.00 <
0.05)
thrpt: [-31.537% -31.257% -30.913%]
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) low severe
2 (2.00%) high mild
Benchmarking int64/sum nonnull
Benchmarking int64/sum nonnull: Warming up for 3.0000 s
Benchmarking int64/sum nonnull: Collecting 100 samples in estimated 5.0113 s
(793k iterations)
Benchmarking int64/sum nonnull: Analyzing
int64/sum nonnull time: [6.2206 µs 6.2618 µs 6.2947 µs]
thrpt: [77.570 GiB/s 77.977 GiB/s 78.495 GiB/s]
change:
time: [+28.297% +29.368% +30.379%] (p = 0.00 <
0.05)
thrpt: [-23.300% -22.701% -22.056%]
Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
17 (17.00%) low severe
1 (1.00%) high mild
Benchmarking int64/min nonnull
Benchmarking int64/min nonnull: Warming up for 3.0000 s
Benchmarking int64/min nonnull: Collecting 100 samples in estimated 5.0368 s
(439k iterations)
Benchmarking int64/min nonnull: Analyzing
int64/min nonnull time: [11.457 µs 11.524 µs 11.565 µs]
thrpt: [42.221 GiB/s 42.369 GiB/s 42.619 GiB/s]
change:
time: [+27.465% +27.939% +28.306%] (p = 0.00 <
0.05)
thrpt: [-22.061% -21.838% -21.547%]
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low severe
1 (1.00%) low mild
2 (2.00%) high mild
Benchmarking int64/max nonnull
Benchmarking int64/max nonnull: Warming up for 3.0000 s
Benchmarking int64/max nonnull: Collecting 100 samples in estimated 5.0537 s
(439k iterations)
Benchmarking int64/max nonnull: Analyzing
int64/max nonnull time: [11.472 µs 11.518 µs 11.548 µs]
thrpt: [42.283 GiB/s 42.393 GiB/s 42.563 GiB/s]
change:
time: [+27.326% +27.832% +28.178%] (p = 0.00 <
0.05)
thrpt: [-21.983% -21.772% -21.461%]
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) low severe
Benchmarking int64/sum nullable
Benchmarking int64/sum nullable: Warming up for 3.0000 s
Benchmarking int64/sum nullable: Collecting 100 samples in estimated 5.0576
s (227k iterations)
Benchmarking int64/sum nullable: Analyzing
int64/sum nullable time: [22.159 µs 22.231 µs 22.286 µs]
thrpt: [21.910 GiB/s 21.964 GiB/s 22.036 GiB/s]
change:
time: [+138.44% +139.39% +140.19%] (p = 0.00 <
0.05)
thrpt: [-58.366% -58.228% -58.061%]
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) low severe
1 (1.00%) low mild
Benchmarking int64/min nullable
Benchmarking int64/min nullable: Warming up for 3.0000 s
Benchmarking int64/min nullable: Collecting 100 samples in estimated 5.1037
s (91k iterations)
Benchmarking int64/min nullable: Analyzing
int64/min nullable time: [56.135 µs 56.211 µs 56.289 µs]
thrpt: [8.6745 GiB/s 8.6865 GiB/s 8.6983 GiB/s]
change:
time: [+103.67% +104.18% +104.58%] (p = 0.00 <
0.05)
thrpt: [-51.120% -51.024% -50.902%]
Performance has regressed.
Benchmarking int64/max nullable
Benchmarking int64/max nullable: Warming up for 3.0000 s
Benchmarking int64/max nullable: Collecting 100 samples in estimated 5.1050
s (91k iterations)
Benchmarking int64/max nullable: Analyzing
int64/max nullable time: [56.167 µs 56.277 µs 56.382 µs]
thrpt: [8.6603 GiB/s 8.6764 GiB/s 8.6934 GiB/s]
change:
time: [+102.92% +103.74% +104.42%] (p = 0.00 <
0.05)
thrpt: [-51.082% -50.919% -50.720%]
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) low severe
Benchmarking string/min nonnull
Benchmarking string/min nonnull: Warming up for 3.0000 s
Benchmarking string/min nonnull: Collecting 100 samples in estimated 5.0352
s (30k iterations)
Benchmarking string/min nonnull: Analyzing
string/min nonnull time: [155.50 µs 156.52 µs 157.37 µs]
thrpt: [416.44 Melem/s 418.70 Melem/s 421.46
Melem/s]
change:
time: [+23.742% +25.113% +26.286%] (p = 0.00 <
0.05)
thrpt: [-20.815% -20.072% -19.186%]
Performance has regressed.
Found 22 outliers among 100 measurements (22.00%)
19 (19.00%) low severe
2 (2.00%) high mild
1 (1.00%) high severe
Benchmarking string/max nonnull
Benchmarking string/max nonnull: Warming up for 3.0000 s
Benchmarking string/max nonnull: Collecting 100 samples in estimated 5.5497
s (35k iterations)
Benchmarking string/max nonnull: Analyzing
string/max nonnull time: [156.72 µs 157.11 µs 157.45 µs]
thrpt: [416.24 Melem/s 417.12 Melem/s 418.16
Melem/s]
change:
time: [+11.214% +11.736% +12.164%] (p = 0.00 <
0.05)
thrpt: [-10.844% -10.504% -10.083%]
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) low severe
1 (1.00%) low mild
1 (1.00%) high mild
Benchmarking string/min nullable
Benchmarking string/min nullable: Warming up for 3.0000 s
Benchmarking string/min nullable: Collecting 100 samples in estimated 5.0964
s (45k iterations)
Benchmarking string/min nullable: Analyzing
string/min nullable time: [111.77 µs 112.19 µs 112.73 µs]
thrpt: [581.36 Melem/s 584.15 Melem/s 586.33
Melem/s]
change:
time: [+27.574% +27.967% +28.391%] (p = 0.00 <
0.05)
thrpt: [-22.113% -21.855% -21.614%]
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low severe
1 (1.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe
Benchmarking string/max nullable
Benchmarking string/max nullable: Warming up for 3.0000 s
Benchmarking string/max nullable: Collecting 100 samples in estimated 5.5168
s (50k iterations)
Benchmarking string/max nullable: Analyzing
string/max nullable time: [108.08 µs 108.89 µs 109.66 µs]
thrpt: [597.64 Melem/s 601.87 Melem/s 606.39
Melem/s]
change:
time: [+26.078% +26.963% +27.772%] (p = 0.00 <
0.05)
thrpt: [-21.736% -21.237% -20.684%]
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
8 (8.00%) low severe
2 (2.00%) high mild
1 (1.00%) high severe
```
</p>
</details>
## `master` @ 61da64a0557c80af5bb43b5f15c6d8bb6a314cb2 with `simd` (nightly
Rust) vs branch (stable Rust 1.73)
<details><summary>Details</summary>
<p>
```
Running benches/aggregate_kernels.rs
(target/release/deps/aggregate_kernels-9282db2d205ca86c)
Benchmarking float32/sum nonnull
Benchmarking float32/sum nonnull: Warming up for 3.0000 s
Benchmarking float32/sum nonnull: Collecting 100 samples in estimated 5.0114
s (808k iterations)
Benchmarking float32/sum nonnull: Analyzing
float32/sum nonnull time: [6.2821 µs 6.3453 µs 6.4165 µs]
thrpt: [38.049 GiB/s 38.476 GiB/s 38.863 GiB/s]
change:
time: [+61.648% +63.000% +64.486%] (p = 0.00 <
0.05)
thrpt: [-39.205% -38.650% -38.137%]
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
2 (2.00%) high mild
14 (14.00%) high severe
Benchmarking float32/min nonnull
Benchmarking float32/min nonnull: Warming up for 3.0000 s
Benchmarking float32/min nonnull: Collecting 100 samples in estimated 5.1040
s (237k iterations)
Benchmarking float32/min nonnull: Analyzing
float32/min nonnull time: [20.695 µs 20.741 µs 20.796 µs]
thrpt: [11.740 GiB/s 11.771 GiB/s 11.797 GiB/s]
change:
time: [+65.066% +66.458% +68.206%] (p = 0.00 <
0.05)
thrpt: [-40.549% -39.925% -39.418%]
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
7 (7.00%) high mild
6 (6.00%) high severe
Benchmarking float32/max nonnull
Benchmarking float32/max nonnull: Warming up for 3.0000 s
Benchmarking float32/max nonnull: Collecting 100 samples in estimated 5.0635
s (247k iterations)
Benchmarking float32/max nonnull: Analyzing
float32/max nonnull time: [20.303 µs 20.329 µs 20.360 µs]
thrpt: [11.991 GiB/s 12.009 GiB/s 12.025 GiB/s]
change:
time: [+133.51% +134.13% +134.98%] (p = 0.00 <
0.05)
thrpt: [-57.443% -57.290% -57.176%]
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
4 (4.00%) high mild
5 (5.00%) high severe
Benchmarking float32/sum nullable
Benchmarking float32/sum nullable: Warming up for 3.0000 s
Benchmarking float32/sum nullable: Collecting 100 samples in estimated
5.0128 s (475k iterations)
Benchmarking float32/sum nullable: Analyzing
float32/sum nullable time: [10.589 µs 10.637 µs 10.699 µs]
thrpt: [22.818 GiB/s 22.951 GiB/s 23.056 GiB/s]
change:
time: [+78.695% +79.646% +80.680%] (p = 0.00 <
0.05)
thrpt: [-44.654% -44.335% -44.039%]
Performance has regressed.
Found 17 outliers among 100 measurements (17.00%)
1 (1.00%) low mild
4 (4.00%) high mild
12 (12.00%) high severe
Benchmarking float32/min nullable
Benchmarking float32/min nullable: Warming up for 3.0000 s
Benchmarking float32/min nullable: Collecting 100 samples in estimated
5.1710 s (106k iterations)
Benchmarking float32/min nullable: Analyzing
float32/min nullable time: [48.715 µs 48.766 µs 48.822 µs]
thrpt: [5.0007 GiB/s 5.0064 GiB/s 5.0116 GiB/s]
change:
time: [+33.507% +33.944% +34.497%] (p = 0.00 <
0.05)
thrpt: [-25.649% -25.342% -25.098%]
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
4 (4.00%) high mild
5 (5.00%) high severe
Benchmarking float32/max nullable
Benchmarking float32/max nullable: Warming up for 3.0000 s
Benchmarking float32/max nullable: Collecting 100 samples in estimated
5.0798 s (101k iterations)
Benchmarking float32/max nullable: Analyzing
float32/max nullable time: [48.964 µs 49.221 µs 49.548 µs]
thrpt: [4.9273 GiB/s 4.9601 GiB/s 4.9862 GiB/s]
change:
time: [+50.526% +51.990% +53.631%] (p = 0.00 <
0.05)
thrpt: [-34.909% -34.206% -33.566%]
Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
4 (4.00%) high mild
11 (11.00%) high severe
Benchmarking float64/sum nonnull
Benchmarking float64/sum nonnull: Warming up for 3.0000 s
Benchmarking float64/sum nonnull: Collecting 100 samples in estimated 5.0237
s (439k iterations)
Benchmarking float64/sum nonnull: Analyzing
float64/sum nonnull time: [11.421 µs 11.552 µs 11.708 µs]
thrpt: [41.704 GiB/s 42.269 GiB/s 42.751 GiB/s]
change:
time: [+47.672% +49.203% +50.975%] (p = 0.00 <
0.05)
thrpt: [-33.764% -32.977% -32.283%]
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) high mild
10 (10.00%) high severe
Benchmarking float64/min nonnull
Benchmarking float64/min nonnull: Warming up for 3.0000 s
Benchmarking float64/min nonnull: Collecting 100 samples in estimated 5.0197
s (86k iterations)
Benchmarking float64/min nonnull: Analyzing
float64/min nonnull time: [57.389 µs 57.690 µs 58.120 µs]
thrpt: [8.4013 GiB/s 8.4638 GiB/s 8.5083 GiB/s]
change:
time: [+204.63% +209.91% +214.99%] (p = 0.00 <
0.05)
thrpt: [-68.253% -67.732% -67.173%]
Performance has regressed.
Found 19 outliers among 100 measurements (19.00%)
1 (1.00%) high mild
18 (18.00%) high severe
Benchmarking float64/max nonnull
Benchmarking float64/max nonnull: Warming up for 3.0000 s
Benchmarking float64/max nonnull: Collecting 100 samples in estimated 5.1424
s (126k iterations)
Benchmarking float64/max nonnull: Analyzing
float64/max nonnull time: [40.828 µs 40.897 µs 40.968 µs]
thrpt: [11.919 GiB/s 11.939 GiB/s 11.959 GiB/s]
change:
time: [+216.42% +218.01% +219.31%] (p = 0.00 <
0.05)
thrpt: [-68.683% -68.555% -68.397%]
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high severe
Benchmarking float64/sum nullable
Benchmarking float64/sum nullable: Warming up for 3.0000 s
Benchmarking float64/sum nullable: Collecting 100 samples in estimated
5.0847 s (227k iterations)
Benchmarking float64/sum nullable: Analyzing
float64/sum nullable time: [22.622 µs 22.714 µs 22.811 µs]
thrpt: [21.405 GiB/s 21.497 GiB/s 21.585 GiB/s]
change:
time: [+161.10% +162.73% +164.31%] (p = 0.00 <
0.05)
thrpt: [-62.165% -61.938% -61.701%]
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
1 (1.00%) low mild
7 (7.00%) high mild
1 (1.00%) high severe
Benchmarking float64/min nullable
Benchmarking float64/min nullable: Warming up for 3.0000 s
Benchmarking float64/min nullable: Collecting 100 samples in estimated
5.0221 s (50k iterations)
Benchmarking float64/min nullable: Analyzing
float64/min nullable time: [98.731 µs 99.523 µs 100.38 µs]
thrpt: [4.8643 GiB/s 4.9062 GiB/s 4.9456 GiB/s]
change:
time: [+173.41% +175.20% +176.95%] (p = 0.00 <
0.05)
thrpt: [-63.892% -63.663% -63.425%]
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
5 (5.00%) high mild
2 (2.00%) high severe
Benchmarking float64/max nullable
Benchmarking float64/max nullable: Warming up for 3.0000 s
Benchmarking float64/max nullable: Collecting 100 samples in estimated
5.0448 s (50k iterations)
Benchmarking float64/max nullable: Analyzing
float64/max nullable time: [98.333 µs 98.718 µs 99.144 µs]
thrpt: [4.9250 GiB/s 4.9462 GiB/s 4.9656 GiB/s]
change:
time: [+225.06% +226.85% +228.72%] (p = 0.00 <
0.05)
thrpt: [-69.579% -69.405% -69.236%]
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
Benchmarking int8/sum nonnull
Benchmarking int8/sum nonnull: Warming up for 3.0000 s
Benchmarking int8/sum nonnull: Collecting 100 samples in estimated 5.0006 s
(9.3M iterations)
Benchmarking int8/sum nonnull: Analyzing
int8/sum nonnull time: [538.87 ns 540.39 ns 542.10 ns]
thrpt: [112.59 GiB/s 112.95 GiB/s 113.27 GiB/s]
change:
time: [+5.8410% +6.3350% +6.7450%] (p = 0.00 <
0.05)
thrpt: [-6.3188% -5.9576% -5.5187%]
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe
Benchmarking int8/min nonnull
Benchmarking int8/min nonnull: Warming up for 3.0000 s
Benchmarking int8/min nonnull: Collecting 100 samples in estimated 5.0010 s
(9.2M iterations)
Benchmarking int8/min nonnull: Analyzing
int8/min nonnull time: [539.84 ns 540.98 ns 542.21 ns]
thrpt: [112.57 GiB/s 112.82 GiB/s 113.06 GiB/s]
change:
time: [-98.907% -98.901% -98.895%] (p = 0.00 <
0.05)
thrpt: [+8951.4% +8998.4% +9050.6%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) low severe
2 (2.00%) high mild
4 (4.00%) high severe
Benchmarking int8/max nonnull
Benchmarking int8/max nonnull: Warming up for 3.0000 s
Benchmarking int8/max nonnull: Collecting 100 samples in estimated 5.0014 s
(9.3M iterations)
Benchmarking int8/max nonnull: Analyzing
int8/max nonnull time: [538.51 ns 539.40 ns 540.38 ns]
thrpt: [112.95 GiB/s 113.15 GiB/s 113.34 GiB/s]
change:
time: [-98.893% -98.888% -98.884%] (p = 0.00 <
0.05)
thrpt: [+8861.5% +8896.5% +8936.3%]
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
6 (6.00%) high mild
3 (3.00%) high severe
Benchmarking int8/sum nullable
Benchmarking int8/sum nullable: Warming up for 3.0000 s
Benchmarking int8/sum nullable: Collecting 100 samples in estimated 5.0027 s
(702k iterations)
Benchmarking int8/sum nullable: Analyzing
int8/sum nullable time: [7.0904 µs 7.0965 µs 7.1035 µs]
thrpt: [8.5923 GiB/s 8.6008 GiB/s 8.6082 GiB/s]
change:
time: [+97.769% +98.683% +99.370%] (p = 0.00 <
0.05)
thrpt: [-49.842% -49.669% -49.436%]
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low severe
4 (4.00%) high mild
2 (2.00%) high severe
Benchmarking int8/min nullable
Benchmarking int8/min nullable: Warming up for 3.0000 s
Benchmarking int8/min nullable: Collecting 100 samples in estimated 5.0248 s
(636k iterations)
Benchmarking int8/min nullable: Analyzing
int8/min nullable time: [7.8877 µs 7.9023 µs 7.9149 µs]
thrpt: [7.7114 GiB/s 7.7237 GiB/s 7.7380 GiB/s]
change:
time: [-78.217% -78.112% -78.025%] (p = 0.00 <
0.05)
thrpt: [+355.06% +356.88% +359.07%]
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) low mild
1 (1.00%) high severe
Benchmarking int8/max nullable
Benchmarking int8/max nullable: Warming up for 3.0000 s
Benchmarking int8/max nullable: Collecting 100 samples in estimated 5.0380 s
(626k iterations)
Benchmarking int8/max nullable: Analyzing
int8/max nullable time: [8.0177 µs 8.0409 µs 8.0667 µs]
thrpt: [7.5663 GiB/s 7.5906 GiB/s 7.6125 GiB/s]
change:
time: [-77.679% -77.566% -77.463%] (p = 0.00 <
0.05)
thrpt: [+343.72% +345.75% +348.00%]
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
Benchmarking int16/sum nonnull
Benchmarking int16/sum nonnull: Warming up for 3.0000 s
Benchmarking int16/sum nonnull: Collecting 100 samples in estimated 5.0034 s
(4.6M iterations)
Benchmarking int16/sum nonnull: Analyzing
int16/sum nonnull time: [1.0893 µs 1.0911 µs 1.0932 µs]
thrpt: [111.66 GiB/s 111.88 GiB/s 112.07 GiB/s]
change:
time: [+4.5498% +5.0884% +5.5881%] (p = 0.00 <
0.05)
thrpt: [-5.2924% -4.8420% -4.3518%]
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe
Benchmarking int16/min nonnull
Benchmarking int16/min nonnull: Warming up for 3.0000 s
Benchmarking int16/min nonnull: Collecting 100 samples in estimated 5.0053 s
(4.6M iterations)
Benchmarking int16/min nonnull: Analyzing
int16/min nonnull time: [1.0906 µs 1.0936 µs 1.0969 µs]
thrpt: [111.29 GiB/s 111.62 GiB/s 111.93 GiB/s]
change:
time: [-1.3424% -0.6267% +0.0176%] (p = 0.07 >
0.05)
thrpt: [-0.0176% +0.6307% +1.3607%]
No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low severe
4 (4.00%) high mild
5 (5.00%) high severe
Benchmarking int16/max nonnull
Benchmarking int16/max nonnull: Warming up for 3.0000 s
Benchmarking int16/max nonnull: Collecting 100 samples in estimated 5.0037 s
(4.5M iterations)
Benchmarking int16/max nonnull: Analyzing
int16/max nonnull time: [1.0981 µs 1.1014 µs 1.1051 µs]
thrpt: [110.46 GiB/s 110.83 GiB/s 111.17 GiB/s]
change:
time: [+1.0907% +1.6125% +2.0946%] (p = 0.00 <
0.05)
thrpt: [-2.0516% -1.5869% -1.0789%]
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low mild
2 (2.00%) high mild
Benchmarking int16/sum nullable
Benchmarking int16/sum nullable: Warming up for 3.0000 s
Benchmarking int16/sum nullable: Collecting 100 samples in estimated 5.0260
s (641k iterations)
Benchmarking int16/sum nullable: Analyzing
int16/sum nullable time: [7.6647 µs 7.6795 µs 7.6975 µs]
thrpt: [15.858 GiB/s 15.896 GiB/s 15.926 GiB/s]
change:
time: [+83.613% +85.760% +87.753%] (p = 0.00 <
0.05)
thrpt: [-46.739% -46.167% -45.538%]
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe
Benchmarking int16/min nullable
Benchmarking int16/min nullable: Warming up for 3.0000 s
Benchmarking int16/min nullable: Collecting 100 samples in estimated 5.0478
s (389k iterations)
Benchmarking int16/min nullable: Analyzing
int16/min nullable time: [13.082 µs 13.132 µs 13.184 µs]
thrpt: [9.2592 GiB/s 9.2959 GiB/s 9.3310 GiB/s]
change:
time: [+98.383% +99.218% +99.994%] (p = 0.00 <
0.05)
thrpt: [-49.999% -49.804% -49.593%]
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
1 (1.00%) low severe
4 (4.00%) low mild
6 (6.00%) high mild
2 (2.00%) high severe
Benchmarking int16/max nullable
Benchmarking int16/max nullable: Warming up for 3.0000 s
Benchmarking int16/max nullable: Collecting 100 samples in estimated 5.0205
s (384k iterations)
Benchmarking int16/max nullable: Analyzing
int16/max nullable time: [13.091 µs 13.122 µs 13.153 µs]
thrpt: [9.2811 GiB/s 9.3029 GiB/s 9.3244 GiB/s]
change:
time: [+91.868% +93.748% +95.419%] (p = 0.00 <
0.05)
thrpt: [-48.828% -48.387% -47.881%]
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
2 (2.00%) low severe
1 (1.00%) low mild
7 (7.00%) high mild
2 (2.00%) high severe
Benchmarking int32/sum nonnull
Benchmarking int32/sum nonnull: Warming up for 3.0000 s
Benchmarking int32/sum nonnull: Collecting 100 samples in estimated 5.0060 s
(2.0M iterations)
Benchmarking int32/sum nonnull: Analyzing
int32/sum nonnull time: [2.4976 µs 2.5039 µs 2.5107 µs]
thrpt: [97.242 GiB/s 97.505 GiB/s 97.751 GiB/s]
change:
time: [+2.8546% +3.1854% +3.4974%] (p = 0.00 <
0.05)
thrpt: [-3.3792% -3.0871% -2.7753%]
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
Benchmarking int32/min nonnull
Benchmarking int32/min nonnull: Warming up for 3.0000 s
Benchmarking int32/min nonnull: Collecting 100 samples in estimated 5.0091 s
(2.0M iterations)
Benchmarking int32/min nonnull: Analyzing
int32/min nonnull time: [2.4869 µs 2.4930 µs 2.4997 µs]
thrpt: [97.668 GiB/s 97.930 GiB/s 98.169 GiB/s]
change:
time: [+1.7370% +2.0986% +2.4353%] (p = 0.00 <
0.05)
thrpt: [-2.3774% -2.0554% -1.7073%]
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
9 (9.00%) high mild
2 (2.00%) high severe
Benchmarking int32/max nonnull
Benchmarking int32/max nonnull: Warming up for 3.0000 s
Benchmarking int32/max nonnull: Collecting 100 samples in estimated 5.0063 s
(2.0M iterations)
Benchmarking int32/max nonnull: Analyzing
int32/max nonnull time: [2.5015 µs 2.5094 µs 2.5184 µs]
thrpt: [96.944 GiB/s 97.291 GiB/s 97.597 GiB/s]
change:
time: [+2.5025% +2.8808% +3.3134%] (p = 0.00 <
0.05)
thrpt: [-3.2072% -2.8001% -2.4414%]
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
6 (6.00%) high mild
1 (1.00%) high severe
Benchmarking int32/sum nullable
Benchmarking int32/sum nullable: Warming up for 3.0000 s
Benchmarking int32/sum nullable: Collecting 100 samples in estimated 5.0198
s (561k iterations)
Benchmarking int32/sum nullable: Analyzing
int32/sum nullable time: [8.9102 µs 8.9258 µs 8.9431 µs]
thrpt: [27.299 GiB/s 27.352 GiB/s 27.400 GiB/s]
change:
time: [+91.210% +91.759% +92.305%] (p = 0.00 <
0.05)
thrpt: [-47.999% -47.851% -47.701%]
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) high mild
4 (4.00%) high severe
Benchmarking int32/min nullable
Benchmarking int32/min nullable: Warming up for 3.0000 s
Benchmarking int32/min nullable: Collecting 100 samples in estimated 5.1210
s (172k iterations)
Benchmarking int32/min nullable: Analyzing
int32/min nullable time: [29.749 µs 29.815 µs 29.885 µs]
thrpt: [8.1693 GiB/s 8.1885 GiB/s 8.2065 GiB/s]
change:
time: [+14.072% +14.457% +14.836%] (p = 0.00 <
0.05)
thrpt: [-12.919% -12.631% -12.336%]
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
10 (10.00%) high mild
3 (3.00%) high severe
Benchmarking int32/max nullable
Benchmarking int32/max nullable: Warming up for 3.0000 s
Benchmarking int32/max nullable: Collecting 100 samples in estimated 5.1090
s (172k iterations)
Benchmarking int32/max nullable: Analyzing
int32/max nullable time: [29.923 µs 30.000 µs 30.072 µs]
thrpt: [8.1185 GiB/s 8.1381 GiB/s 8.1588 GiB/s]
change:
time: [+14.095% +14.522% +14.892%] (p = 0.00 <
0.05)
thrpt: [-12.962% -12.681% -12.354%]
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high mild
Benchmarking int64/sum nonnull
Benchmarking int64/sum nonnull: Warming up for 3.0000 s
Benchmarking int64/sum nonnull: Collecting 100 samples in estimated 5.0249 s
(1.0M iterations)
Benchmarking int64/sum nonnull: Analyzing
int64/sum nonnull time: [4.9775 µs 4.9915 µs 5.0056 µs]
thrpt: [97.547 GiB/s 97.823 GiB/s 98.098 GiB/s]
change:
time: [+3.8393% +4.1991% +4.5411%] (p = 0.00 <
0.05)
thrpt: [-4.3439% -4.0298% -3.6973%]
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
Benchmarking int64/min nonnull
Benchmarking int64/min nonnull: Warming up for 3.0000 s
Benchmarking int64/min nonnull: Collecting 100 samples in estimated 5.0209 s
(550k iterations)
Benchmarking int64/min nonnull: Analyzing
int64/min nonnull time: [9.0822 µs 9.0970 µs 9.1127 µs]
thrpt: [53.582 GiB/s 53.675 GiB/s 53.763 GiB/s]
change:
time: [+0.1531% +1.1697% +2.1142%] (p = 0.02 <
0.05)
thrpt: [-2.0704% -1.1561% -0.1529%]
Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
Benchmarking int64/max nonnull
Benchmarking int64/max nonnull: Warming up for 3.0000 s
Benchmarking int64/max nonnull: Collecting 100 samples in estimated 5.0093 s
(550k iterations)
Benchmarking int64/max nonnull: Analyzing
int64/max nonnull time: [9.0871 µs 9.1068 µs 9.1282 µs]
thrpt: [53.491 GiB/s 53.617 GiB/s 53.733 GiB/s]
change:
time: [+3.3412% +3.6883% +4.0543%] (p = 0.00 <
0.05)
thrpt: [-3.8964% -3.5571% -3.2332%]
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
Benchmarking int64/sum nullable
Benchmarking int64/sum nullable: Warming up for 3.0000 s
Benchmarking int64/sum nullable: Collecting 100 samples in estimated 5.0577
s (288k iterations)
Benchmarking int64/sum nullable: Analyzing
int64/sum nullable time: [17.531 µs 17.592 µs 17.653 µs]
thrpt: [27.659 GiB/s 27.756 GiB/s 27.852 GiB/s]
change:
time: [+84.485% +86.149% +87.741%] (p = 0.00 <
0.05)
thrpt: [-46.735% -46.280% -45.795%]
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
10 (10.00%) high mild
4 (4.00%) high severe
Benchmarking int64/min nullable
Benchmarking int64/min nullable: Warming up for 3.0000 s
Benchmarking int64/min nullable: Collecting 100 samples in estimated 5.1196
s (116k iterations)
Benchmarking int64/min nullable: Analyzing
int64/min nullable time: [43.743 µs 43.834 µs 43.935 µs]
thrpt: [11.114 GiB/s 11.139 GiB/s 11.163 GiB/s]
change:
time: [+66.879% +67.747% +68.541%] (p = 0.00 <
0.05)
thrpt: [-40.667% -40.387% -40.076%]
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) low mild
3 (3.00%) high mild
2 (2.00%) high severe
Benchmarking int64/max nullable
Benchmarking int64/max nullable: Warming up for 3.0000 s
Benchmarking int64/max nullable: Collecting 100 samples in estimated 5.0448
s (116k iterations)
Benchmarking int64/max nullable: Analyzing
int64/max nullable time: [43.566 µs 43.668 µs 43.765 µs]
thrpt: [11.157 GiB/s 11.182 GiB/s 11.208 GiB/s]
change:
time: [+67.097% +67.871% +68.564%] (p = 0.00 <
0.05)
thrpt: [-40.675% -40.431% -40.155%]
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
Benchmarking string/min nonnull
Benchmarking string/min nonnull: Warming up for 3.0000 s
Benchmarking string/min nonnull: Collecting 100 samples in estimated 5.0449
s (40k iterations)
Benchmarking string/min nonnull: Analyzing
string/min nonnull time: [124.28 µs 124.53 µs 124.82 µs]
thrpt: [525.03 Melem/s 526.26 Melem/s 527.33
Melem/s]
change:
time: [-1.5237% -0.5664% +0.2709%] (p = 0.23 >
0.05)
thrpt: [-0.2701% +0.5696% +1.5473%]
No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
7 (7.00%) high mild
1 (1.00%) high severe
Benchmarking string/max nonnull
Benchmarking string/max nonnull: Warming up for 3.0000 s
Benchmarking string/max nonnull: Collecting 100 samples in estimated 5.6891
s (40k iterations)
Benchmarking string/max nonnull: Analyzing
string/max nonnull time: [140.92 µs 141.13 µs 141.35 µs]
thrpt: [463.63 Melem/s 464.35 Melem/s 465.05
Melem/s]
change:
time: [+2.2748% +2.9322% +3.4956%] (p = 0.00 <
0.05)
thrpt: [-3.3776% -2.8486% -2.2242%]
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
5 (5.00%) high mild
1 (1.00%) high severe
Benchmarking string/min nullable
Benchmarking string/min nullable: Warming up for 3.0000 s
Benchmarking string/min nullable: Collecting 100 samples in estimated 5.4040
s (61k iterations)
Benchmarking string/min nullable: Analyzing
string/min nullable time: [87.851 µs 87.970 µs 88.096 µs]
thrpt: [743.91 Melem/s 744.98 Melem/s 745.99
Melem/s]
change:
time: [+0.1933% +0.4888% +0.8375%] (p = 0.00 <
0.05)
thrpt: [-0.8306% -0.4864% -0.1929%]
Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
Benchmarking string/max nullable
Benchmarking string/max nullable: Warming up for 3.0000 s
Benchmarking string/max nullable: Collecting 100 samples in estimated 5.3001
s (61k iterations)
Benchmarking string/max nullable: Analyzing
string/max nullable time: [85.671 µs 86.304 µs 87.363 µs]
thrpt: [750.16 Melem/s 759.36 Melem/s 764.97
Melem/s]
change:
time: [-8.1227% -6.5198% -4.9163%] (p = 0.00 <
0.05)
thrpt: [+5.1705% +6.9746% +8.8408%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
```
</p>
</details>
<details><summary>Test script</summary>
<p>
```bash
#git merge-base HEAD origin/master
#61da64a0557c80af5bb43b5f15c6d8bb6a314cb2
#gh pr checkout https://github.com/apache/arrow-rs/pull/5100
echo "***compare using nightly***"
git checkout 61da64a0557c80af5bb43b5f15c6d8bb6a314cb2
RUSTFLAGS="-Ctarget-cpu=native" cargo +nightly bench --features=simd --bench
aggregate_kernels
gh pr checkout https://github.com/apache/arrow-rs/pull/5100
RUSTFLAGS="-Ctarget-cpu=native" cargo +nightly bench --features=simd --bench
aggregate_kernels
echo "*** compare using stable ***"
git checkout 61da64a0557c80af5bb43b5f15c6d8bb6a314cb2
RUSTFLAGS="-Ctarget-cpu=native" cargo +nightly bench --features=simd --bench
aggregate_kernels
gh pr checkout https://github.com/apache/arrow-rs/pull/5100
RUSTFLAGS="-Ctarget-cpu=native" cargo +1.73.0 bench --bench aggregate_kernels
```
</p>
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]