alamb commented on PR #5100:
URL: https://github.com/apache/arrow-rs/pull/5100#issuecomment-1827933052

   Here is my performance results
   
   Machine:
   ```
     Model Name:        MacBook Pro
     Model Identifier:  Mac15,9
     Model Number:      Z1AH000VNLL/A
     Chip:      Apple M3 Max
     Total Number of Cores:     16 (12 performance and 4 efficiency)
     Memory:    64 GB
   ```
   ## `master` @ 61da64a0557c80af5bb43b5f15c6d8bb6a314cb2 with `simd` vs branch 
 (both with nightly Rust)
   
   <details><summary>Details</summary>
   <p>
   
   ```
        Running benches/aggregate_kernels.rs 
(target/release/deps/aggregate_kernels-0072a695b99ab014)
   Benchmarking float32/sum nonnull
   Benchmarking float32/sum nonnull: Warming up for 3.0000 s
   Benchmarking float32/sum nonnull: Collecting 100 samples in estimated 5.0218 
s (773k iterations)
   Benchmarking float32/sum nonnull: Analyzing
   float32/sum nonnull     time:   [6.4869 µs 6.4915 µs 6.4973 µs]
                           thrpt:  [37.576 GiB/s 37.609 GiB/s 37.636 GiB/s]
                    change:
                           time:   [+112.84% +113.46% +114.05%] (p = 0.00 < 
0.05)
                           thrpt:  [-53.282% -53.152% -53.016%]
                           Performance has regressed.
   Found 6 outliers among 100 measurements (6.00%)
     2 (2.00%) high mild
     4 (4.00%) high severe
   Benchmarking float32/min nonnull
   Benchmarking float32/min nonnull: Warming up for 3.0000 s
   Benchmarking float32/min nonnull: Collecting 100 samples in estimated 5.0622 
s (232k iterations)
   Benchmarking float32/min nonnull: Analyzing
   float32/min nonnull     time:   [21.741 µs 21.756 µs 21.772 µs]
                           thrpt:  [11.213 GiB/s 11.222 GiB/s 11.229 GiB/s]
                    change:
                           time:   [+121.43% +122.23% +122.94%] (p = 0.00 < 
0.05)
                           thrpt:  [-55.146% -55.002% -54.839%]
                           Performance has regressed.
   Found 10 outliers among 100 measurements (10.00%)
     6 (6.00%) high mild
     4 (4.00%) high severe
   Benchmarking float32/max nonnull
   Benchmarking float32/max nonnull: Warming up for 3.0000 s
   Benchmarking float32/max nonnull: Collecting 100 samples in estimated 5.0525 
s (232k iterations)
   Benchmarking float32/max nonnull: Analyzing
   float32/max nonnull     time:   [21.489 µs 21.530 µs 21.578 µs]
                           thrpt:  [11.315 GiB/s 11.340 GiB/s 11.361 GiB/s]
                    change:
                           time:   [+216.76% +218.04% +219.36%] (p = 0.00 < 
0.05)
                           thrpt:  [-68.687% -68.557% -68.431%]
                           Performance has regressed.
   Found 4 outliers among 100 measurements (4.00%)
     1 (1.00%) low severe
     2 (2.00%) high mild
     1 (1.00%) high severe
   Benchmarking float32/sum nullable
   Benchmarking float32/sum nullable: Warming up for 3.0000 s
   Benchmarking float32/sum nullable: Collecting 100 samples in estimated 
5.0088 s (470k iterations)
   Benchmarking float32/sum nullable: Analyzing
   float32/sum nullable    time:   [10.645 µs 10.654 µs 10.663 µs]
                           thrpt:  [22.897 GiB/s 22.916 GiB/s 22.935 GiB/s]
                    change:
                           time:   [+129.35% +129.98% +130.61%] (p = 0.00 < 
0.05)
                           thrpt:  [-56.636% -56.517% -56.399%]
                           Performance has regressed.
   Found 11 outliers among 100 measurements (11.00%)
     1 (1.00%) low mild
     5 (5.00%) high mild
     5 (5.00%) high severe
   Benchmarking float32/min nullable
   Benchmarking float32/min nullable: Warming up for 3.0000 s
   Benchmarking float32/min nullable: Collecting 100 samples in estimated 
5.1691 s (106k iterations)
   Benchmarking float32/min nullable: Analyzing
   float32/min nullable    time:   [48.697 µs 48.749 µs 48.809 µs]
                           thrpt:  [5.0019 GiB/s 5.0081 GiB/s 5.0135 GiB/s]
                    change:
                           time:   [+82.253% +83.059% +83.836%] (p = 0.00 < 
0.05)
                           thrpt:  [-45.604% -45.373% -45.131%]
                           Performance has regressed.
   Found 11 outliers among 100 measurements (11.00%)
     4 (4.00%) high mild
     7 (7.00%) high severe
   Benchmarking float32/max nullable
   Benchmarking float32/max nullable: Warming up for 3.0000 s
   Benchmarking float32/max nullable: Collecting 100 samples in estimated 
5.1709 s (106k iterations)
   Benchmarking float32/max nullable: Analyzing
   float32/max nullable    time:   [48.719 µs 48.793 µs 48.884 µs]
                           thrpt:  [4.9943 GiB/s 5.0036 GiB/s 5.0112 GiB/s]
                    change:
                           time:   [+102.72% +104.47% +106.07%] (p = 0.00 < 
0.05)
                           thrpt:  [-51.473% -51.094% -50.670%]
                           Performance has regressed.
   Found 16 outliers among 100 measurements (16.00%)
     6 (6.00%) high mild
     10 (10.00%) high severe
   
   Benchmarking float64/sum nonnull
   Benchmarking float64/sum nonnull: Warming up for 3.0000 s
   Benchmarking float64/sum nonnull: Collecting 100 samples in estimated 5.0318 
s (429k iterations)
   Benchmarking float64/sum nonnull: Analyzing
   float64/sum nonnull     time:   [11.717 µs 11.748 µs 11.777 µs]
                           thrpt:  [41.462 GiB/s 41.562 GiB/s 41.674 GiB/s]
                    change:
                           time:   [+96.573% +97.450% +98.222%] (p = 0.00 < 
0.05)
                           thrpt:  [-49.552% -49.354% -49.128%]
                           Performance has regressed.
   Found 3 outliers among 100 measurements (3.00%)
     1 (1.00%) low mild
     1 (1.00%) high mild
     1 (1.00%) high severe
   Benchmarking float64/min nonnull
   Benchmarking float64/min nonnull: Warming up for 3.0000 s
   Benchmarking float64/min nonnull: Collecting 100 samples in estimated 5.0319 
s (86k iterations)
   Benchmarking float64/min nonnull: Analyzing
   float64/min nonnull     time:   [57.630 µs 57.765 µs 57.921 µs]
                           thrpt:  [8.4301 GiB/s 8.4530 GiB/s 8.4727 GiB/s]
                    change:
                           time:   [+196.14% +197.63% +199.35%] (p = 0.00 < 
0.05)
                           thrpt:  [-66.595% -66.402% -66.232%]
                           Performance has regressed.
   Found 7 outliers among 100 measurements (7.00%)
     4 (4.00%) high mild
     3 (3.00%) high severe
   Benchmarking float64/max nonnull
   Benchmarking float64/max nonnull: Warming up for 3.0000 s
   Benchmarking float64/max nonnull: Collecting 100 samples in estimated 5.0342 
s (121k iterations)
   Benchmarking float64/max nonnull: Analyzing
   float64/max nonnull     time:   [41.669 µs 41.851 µs 42.026 µs]
                           thrpt:  [11.619 GiB/s 11.667 GiB/s 11.718 GiB/s]
                    change:
                           time:   [+204.20% +205.40% +206.66%] (p = 0.00 < 
0.05)
                           thrpt:  [-67.390% -67.256% -67.127%]
                           Performance has regressed.
   Found 17 outliers among 100 measurements (17.00%)
     6 (6.00%) low severe
     3 (3.00%) low mild
     2 (2.00%) high mild
     6 (6.00%) high severe
   Benchmarking float64/sum nullable
   Benchmarking float64/sum nullable: Warming up for 3.0000 s
   Benchmarking float64/sum nullable: Collecting 100 samples in estimated 
5.0564 s (227k iterations)
   Benchmarking float64/sum nullable: Analyzing
   float64/sum nullable    time:   [22.209 µs 22.224 µs 22.241 µs]
                           thrpt:  [21.954 GiB/s 21.971 GiB/s 21.985 GiB/s]
                    change:
                           time:   [+138.22% +139.19% +140.17%] (p = 0.00 < 
0.05)
                           thrpt:  [-58.363% -58.192% -58.021%]
                           Performance has regressed.
   Found 11 outliers among 100 measurements (11.00%)
     3 (3.00%) low severe
     4 (4.00%) high mild
     4 (4.00%) high severe
   Benchmarking float64/min nullable
   Benchmarking float64/min nullable: Warming up for 3.0000 s
   Benchmarking float64/min nullable: Collecting 100 samples in estimated 
5.4217 s (56k iterations)
   Benchmarking float64/min nullable: Analyzing
   float64/min nullable    time:   [97.439 µs 97.559 µs 97.693 µs]
                           thrpt:  [4.9981 GiB/s 5.0050 GiB/s 5.0111 GiB/s]
                    change:
                           time:   [+158.93% +160.03% +161.18%] (p = 0.00 < 
0.05)
                           thrpt:  [-61.712% -61.543% -61.380%]
                           Performance has regressed.
   Found 9 outliers among 100 measurements (9.00%)
     5 (5.00%) high mild
     4 (4.00%) high severe
   Benchmarking float64/max nullable
   Benchmarking float64/max nullable: Warming up for 3.0000 s
   Benchmarking float64/max nullable: Collecting 100 samples in estimated 
5.4214 s (56k iterations)
   Benchmarking float64/max nullable: Analyzing
   float64/max nullable    time:   [97.401 µs 97.493 µs 97.602 µs]
                           thrpt:  [5.0028 GiB/s 5.0083 GiB/s 5.0131 GiB/s]
                    change:
                           time:   [+202.23% +203.27% +204.73%] (p = 0.00 < 
0.05)
                           thrpt:  [-67.184% -67.026% -66.913%]
                           Performance has regressed.
   Found 7 outliers among 100 measurements (7.00%)
     2 (2.00%) high mild
     5 (5.00%) high severe
   
   Benchmarking int8/sum nonnull
   Benchmarking int8/sum nonnull: Warming up for 3.0000 s
   Benchmarking int8/sum nonnull: Collecting 100 samples in estimated 5.0023 s 
(9.3M iterations)
   Benchmarking int8/sum nonnull: Analyzing
   int8/sum nonnull        time:   [536.05 ns 536.60 ns 537.27 ns]
                           thrpt:  [113.60 GiB/s 113.74 GiB/s 113.86 GiB/s]
                    change:
                           time:   [-1.3393% -0.9518% -0.5531%] (p = 0.00 < 
0.05)
                           thrpt:  [+0.5561% +0.9609% +1.3575%]
                           Change within noise threshold.
   Found 11 outliers among 100 measurements (11.00%)
     5 (5.00%) high mild
     6 (6.00%) high severe
   Benchmarking int8/min nonnull
   Benchmarking int8/min nonnull: Warming up for 3.0000 s
   Benchmarking int8/min nonnull: Collecting 100 samples in estimated 5.0002 s 
(9.3M iterations)
   Benchmarking int8/min nonnull: Analyzing
   int8/min nonnull        time:   [535.70 ns 536.25 ns 536.80 ns]
                           thrpt:  [113.70 GiB/s 113.82 GiB/s 113.94 GiB/s]
                    change:
                           time:   [-98.979% -98.976% -98.973%] (p = 0.00 < 
0.05)
                           thrpt:  [+9633.9% +9662.2% +9693.3%]
                           Performance has improved.
   Found 13 outliers among 100 measurements (13.00%)
     1 (1.00%) low mild
     7 (7.00%) high mild
     5 (5.00%) high severe
   Benchmarking int8/max nonnull
   Benchmarking int8/max nonnull: Warming up for 3.0000 s
   Benchmarking int8/max nonnull: Collecting 100 samples in estimated 5.0007 s 
(9.3M iterations)
   Benchmarking int8/max nonnull: Analyzing
   int8/max nonnull        time:   [535.67 ns 536.06 ns 536.49 ns]
                           thrpt:  [113.77 GiB/s 113.86 GiB/s 113.94 GiB/s]
                    change:
                           time:   [-98.965% -98.962% -98.959%] (p = 0.00 < 
0.05)
                           thrpt:  [+9503.6% +9532.7% +9563.0%]
                           Performance has improved.
   Found 10 outliers among 100 measurements (10.00%)
     1 (1.00%) low severe
     5 (5.00%) high mild
     4 (4.00%) high severe
   Benchmarking int8/sum nullable
   Benchmarking int8/sum nullable: Warming up for 3.0000 s
   Benchmarking int8/sum nullable: Collecting 100 samples in estimated 5.0232 s 
(707k iterations)
   Benchmarking int8/sum nullable: Analyzing
   int8/sum nullable       time:   [7.0953 µs 7.1011 µs 7.1070 µs]
                           thrpt:  [8.5881 GiB/s 8.5952 GiB/s 8.6022 GiB/s]
                    change:
                           time:   [+87.353% +88.096% +88.866%] (p = 0.00 < 
0.05)
                           thrpt:  [-47.052% -46.836% -46.625%]
                           Performance has regressed.
   Found 14 outliers among 100 measurements (14.00%)
     1 (1.00%) low mild
     6 (6.00%) high mild
     7 (7.00%) high severe
   Benchmarking int8/min nullable
   Benchmarking int8/min nullable: Warming up for 3.0000 s
   Benchmarking int8/min nullable: Collecting 100 samples in estimated 5.0079 s 
(631k iterations)
   Benchmarking int8/min nullable: Analyzing
   int8/min nullable       time:   [7.9245 µs 7.9300 µs 7.9357 µs]
                           thrpt:  [7.6912 GiB/s 7.6968 GiB/s 7.7021 GiB/s]
                    change:
                           time:   [-79.180% -79.104% -79.028%] (p = 0.00 < 
0.05)
                           thrpt:  [+376.83% +378.56% +380.30%]
                           Performance has improved.
   Found 12 outliers among 100 measurements (12.00%)
     1 (1.00%) low severe
     1 (1.00%) low mild
     7 (7.00%) high mild
     3 (3.00%) high severe
   Benchmarking int8/max nullable
   Benchmarking int8/max nullable: Warming up for 3.0000 s
   Benchmarking int8/max nullable: Collecting 100 samples in estimated 5.0048 s 
(631k iterations)
   Benchmarking int8/max nullable: Analyzing
   int8/max nullable       time:   [7.9373 µs 7.9456 µs 7.9539 µs]
                           thrpt:  [7.6736 GiB/s 7.6816 GiB/s 7.6897 GiB/s]
                    change:
                           time:   [-79.127% -79.063% -79.000%] (p = 0.00 < 
0.05)
                           thrpt:  [+376.18% +377.62% +379.10%]
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     1 (1.00%) low mild
     5 (5.00%) high mild
     2 (2.00%) high severe
   
   Benchmarking int16/sum nonnull
   Benchmarking int16/sum nonnull: Warming up for 3.0000 s
   Benchmarking int16/sum nonnull: Collecting 100 samples in estimated 5.0033 s 
(4.6M iterations)
   Benchmarking int16/sum nonnull: Analyzing
   int16/sum nonnull       time:   [1.3119 µs 1.3372 µs 1.3573 µs]
                           thrpt:  [89.937 GiB/s 91.286 GiB/s 93.047 GiB/s]
                    change:
                           time:   [+8.5860% +10.958% +13.451%] (p = 0.00 < 
0.05)
                           thrpt:  [-11.856% -9.8760% -7.9071%]
                           Performance has regressed.
   Benchmarking int16/min nonnull
   Benchmarking int16/min nonnull: Warming up for 3.0000 s
   Benchmarking int16/min nonnull: Collecting 100 samples in estimated 5.0022 s 
(3.3M iterations)
   Benchmarking int16/min nonnull: Analyzing
   int16/min nonnull       time:   [1.3759 µs 1.3845 µs 1.3915 µs]
                           thrpt:  [87.725 GiB/s 88.170 GiB/s 88.721 GiB/s]
                    change:
                           time:   [+17.717% +19.400% +20.696%] (p = 0.00 < 
0.05)
                           thrpt:  [-17.147% -16.248% -15.050%]
                           Performance has regressed.
   Found 16 outliers among 100 measurements (16.00%)
     11 (11.00%) low severe
     1 (1.00%) low mild
     1 (1.00%) high mild
     3 (3.00%) high severe
   Benchmarking int16/max nonnull
   Benchmarking int16/max nonnull: Warming up for 3.0000 s
   Benchmarking int16/max nonnull: Collecting 100 samples in estimated 5.0056 s 
(3.6M iterations)
   Benchmarking int16/max nonnull: Analyzing
   int16/max nonnull       time:   [1.3821 µs 1.3867 µs 1.3905 µs]
                           thrpt:  [87.786 GiB/s 88.031 GiB/s 88.324 GiB/s]
                    change:
                           time:   [+18.013% +19.315% +20.517%] (p = 0.00 < 
0.05)
                           thrpt:  [-17.024% -16.188% -15.263%]
                           Performance has regressed.
   Found 13 outliers among 100 measurements (13.00%)
     9 (9.00%) low severe
     2 (2.00%) low mild
     1 (1.00%) high mild
     1 (1.00%) high severe
   Benchmarking int16/sum nullable
   Benchmarking int16/sum nullable: Warming up for 3.0000 s
   Benchmarking int16/sum nullable: Collecting 100 samples in estimated 5.0016 
s (510k iterations)
   Benchmarking int16/sum nullable: Analyzing
   int16/sum nullable      time:   [9.6419 µs 9.7203 µs 9.7834 µs]
                           thrpt:  [12.477 GiB/s 12.558 GiB/s 12.660 GiB/s]
                    change:
                           time:   [+129.75% +132.40% +134.85%] (p = 0.00 < 
0.05)
                           thrpt:  [-57.419% -56.971% -56.475%]
                           Performance has regressed.
   Found 12 outliers among 100 measurements (12.00%)
     12 (12.00%) low severe
   Benchmarking int16/min nullable
   Benchmarking int16/min nullable: Warming up for 3.0000 s
   Benchmarking int16/min nullable: Collecting 100 samples in estimated 5.0587 
s (303k iterations)
   Benchmarking int16/min nullable: Analyzing
   int16/min nullable      time:   [16.504 µs 16.619 µs 16.709 µs]
                           thrpt:  [7.3057 GiB/s 7.3451 GiB/s 7.3963 GiB/s]
                    change:
                           time:   [+138.61% +139.53% +140.32%] (p = 0.00 < 
0.05)
                           thrpt:  [-58.389% -58.252% -58.090%]
                           Performance has regressed.
   Found 6 outliers among 100 measurements (6.00%)
     5 (5.00%) low severe
     1 (1.00%) low mild
   Benchmarking int16/max nullable
   Benchmarking int16/max nullable: Warming up for 3.0000 s
   Benchmarking int16/max nullable: Collecting 100 samples in estimated 5.0505 
s (303k iterations)
   Benchmarking int16/max nullable: Analyzing
   int16/max nullable      time:   [16.333 µs 18.991 µs 24.750 µs]
                           thrpt:  [4.9321 GiB/s 6.4277 GiB/s 7.4736 GiB/s]
                    change:
                           time:   [+136.19% +155.80% +193.27%] (p = 0.00 < 
0.05)
                           thrpt:  [-65.902% -60.907% -57.661%]
                           Performance has regressed.
   Found 19 outliers among 100 measurements (19.00%)
     11 (11.00%) low severe
     5 (5.00%) low mild
     3 (3.00%) high severe
   
   Benchmarking int32/sum nonnull
   Benchmarking int32/sum nonnull: Warming up for 3.0000 s
   Benchmarking int32/sum nonnull: Collecting 100 samples in estimated 5.0076 s 
(1.6M iterations)
   Benchmarking int32/sum nonnull: Analyzing
   int32/sum nonnull       time:   [3.1533 µs 3.1678 µs 3.1781 µs]
                           thrpt:  [76.819 GiB/s 77.069 GiB/s 77.424 GiB/s]
                    change:
                           time:   [+29.275% +30.126% +30.900%] (p = 0.00 < 
0.05)
                           thrpt:  [-23.606% -23.152% -22.646%]
                           Performance has regressed.
   Found 8 outliers among 100 measurements (8.00%)
     6 (6.00%) low severe
     2 (2.00%) low mild
   Benchmarking int32/min nonnull
   Benchmarking int32/min nonnull: Warming up for 3.0000 s
   Benchmarking int32/min nonnull: Collecting 100 samples in estimated 5.0153 s 
(1.6M iterations)
   Benchmarking int32/min nonnull: Analyzing
   int32/min nonnull       time:   [3.1498 µs 3.1651 µs 3.1780 µs]
                           thrpt:  [76.822 GiB/s 77.134 GiB/s 77.510 GiB/s]
                    change:
                           time:   [+29.048% +29.399% +29.750%] (p = 0.00 < 
0.05)
                           thrpt:  [-22.929% -22.720% -22.509%]
                           Performance has regressed.
   Found 4 outliers among 100 measurements (4.00%)
     4 (4.00%) low severe
   Benchmarking int32/max nonnull
   Benchmarking int32/max nonnull: Warming up for 3.0000 s
   Benchmarking int32/max nonnull: Collecting 100 samples in estimated 5.0029 s 
(1.6M iterations)
   Benchmarking int32/max nonnull: Analyzing
   int32/max nonnull       time:   [3.1712 µs 3.1782 µs 3.1846 µs]
                           thrpt:  [76.662 GiB/s 76.816 GiB/s 76.987 GiB/s]
                    change:
                           time:   [+29.378% +29.692% +29.959%] (p = 0.00 < 
0.05)
                           thrpt:  [-23.053% -22.894% -22.707%]
                           Performance has regressed.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) low mild
   Benchmarking int32/sum nullable
   Benchmarking int32/sum nullable: Warming up for 3.0000 s
   Benchmarking int32/sum nullable: Collecting 100 samples in estimated 5.0491 
s (444k iterations)
   Benchmarking int32/sum nullable: Analyzing
   int32/sum nullable      time:   [11.295 µs 11.343 µs 11.383 µs]
                           thrpt:  [21.448 GiB/s 21.523 GiB/s 21.615 GiB/s]
                    change:
                           time:   [+142.74% +144.27% +145.38%] (p = 0.00 < 
0.05)
                           thrpt:  [-59.246% -59.062% -58.803%]
                           Performance has regressed.
   Found 6 outliers among 100 measurements (6.00%)
     6 (6.00%) low severe
   Benchmarking int32/min nullable
   Benchmarking int32/min nullable: Warming up for 3.0000 s
   Benchmarking int32/min nullable: Collecting 100 samples in estimated 5.1615 
s (136k iterations)
   Benchmarking int32/min nullable: Analyzing
   int32/min nullable      time:   [37.962 µs 38.004 µs 38.045 µs]
                           thrpt:  [6.4172 GiB/s 6.4240 GiB/s 6.4312 GiB/s]
                    change:
                           time:   [+43.232% +44.819% +46.001%] (p = 0.00 < 
0.05)
                           thrpt:  [-31.507% -30.948% -30.183%]
                           Performance has regressed.
   Found 7 outliers among 100 measurements (7.00%)
     5 (5.00%) low severe
     2 (2.00%) high severe
   Benchmarking int32/max nullable
   Benchmarking int32/max nullable: Warming up for 3.0000 s
   Benchmarking int32/max nullable: Collecting 100 samples in estimated 5.1674 
s (136k iterations)
   Benchmarking int32/max nullable: Analyzing
   int32/max nullable      time:   [37.742 µs 37.873 µs 37.973 µs]
                           thrpt:  [6.4293 GiB/s 6.4463 GiB/s 6.4686 GiB/s]
                    change:
                           time:   [+44.745% +45.470% +46.065%] (p = 0.00 < 
0.05)
                           thrpt:  [-31.537% -31.257% -30.913%]
                           Performance has regressed.
   Found 6 outliers among 100 measurements (6.00%)
     4 (4.00%) low severe
     2 (2.00%) high mild
   
   Benchmarking int64/sum nonnull
   Benchmarking int64/sum nonnull: Warming up for 3.0000 s
   Benchmarking int64/sum nonnull: Collecting 100 samples in estimated 5.0113 s 
(793k iterations)
   Benchmarking int64/sum nonnull: Analyzing
   int64/sum nonnull       time:   [6.2206 µs 6.2618 µs 6.2947 µs]
                           thrpt:  [77.570 GiB/s 77.977 GiB/s 78.495 GiB/s]
                    change:
                           time:   [+28.297% +29.368% +30.379%] (p = 0.00 < 
0.05)
                           thrpt:  [-23.300% -22.701% -22.056%]
                           Performance has regressed.
   Found 18 outliers among 100 measurements (18.00%)
     17 (17.00%) low severe
     1 (1.00%) high mild
   Benchmarking int64/min nonnull
   Benchmarking int64/min nonnull: Warming up for 3.0000 s
   Benchmarking int64/min nonnull: Collecting 100 samples in estimated 5.0368 s 
(439k iterations)
   Benchmarking int64/min nonnull: Analyzing
   int64/min nonnull       time:   [11.457 µs 11.524 µs 11.565 µs]
                           thrpt:  [42.221 GiB/s 42.369 GiB/s 42.619 GiB/s]
                    change:
                           time:   [+27.465% +27.939% +28.306%] (p = 0.00 < 
0.05)
                           thrpt:  [-22.061% -21.838% -21.547%]
                           Performance has regressed.
   Found 4 outliers among 100 measurements (4.00%)
     1 (1.00%) low severe
     1 (1.00%) low mild
     2 (2.00%) high mild
   Benchmarking int64/max nonnull
   Benchmarking int64/max nonnull: Warming up for 3.0000 s
   Benchmarking int64/max nonnull: Collecting 100 samples in estimated 5.0537 s 
(439k iterations)
   Benchmarking int64/max nonnull: Analyzing
   int64/max nonnull       time:   [11.472 µs 11.518 µs 11.548 µs]
                           thrpt:  [42.283 GiB/s 42.393 GiB/s 42.563 GiB/s]
                    change:
                           time:   [+27.326% +27.832% +28.178%] (p = 0.00 < 
0.05)
                           thrpt:  [-21.983% -21.772% -21.461%]
                           Performance has regressed.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) low severe
   Benchmarking int64/sum nullable
   Benchmarking int64/sum nullable: Warming up for 3.0000 s
   Benchmarking int64/sum nullable: Collecting 100 samples in estimated 5.0576 
s (227k iterations)
   Benchmarking int64/sum nullable: Analyzing
   int64/sum nullable      time:   [22.159 µs 22.231 µs 22.286 µs]
                           thrpt:  [21.910 GiB/s 21.964 GiB/s 22.036 GiB/s]
                    change:
                           time:   [+138.44% +139.39% +140.19%] (p = 0.00 < 
0.05)
                           thrpt:  [-58.366% -58.228% -58.061%]
                           Performance has regressed.
   Found 4 outliers among 100 measurements (4.00%)
     3 (3.00%) low severe
     1 (1.00%) low mild
   Benchmarking int64/min nullable
   Benchmarking int64/min nullable: Warming up for 3.0000 s
   Benchmarking int64/min nullable: Collecting 100 samples in estimated 5.1037 
s (91k iterations)
   Benchmarking int64/min nullable: Analyzing
   int64/min nullable      time:   [56.135 µs 56.211 µs 56.289 µs]
                           thrpt:  [8.6745 GiB/s 8.6865 GiB/s 8.6983 GiB/s]
                    change:
                           time:   [+103.67% +104.18% +104.58%] (p = 0.00 < 
0.05)
                           thrpt:  [-51.120% -51.024% -50.902%]
                           Performance has regressed.
   Benchmarking int64/max nullable
   Benchmarking int64/max nullable: Warming up for 3.0000 s
   Benchmarking int64/max nullable: Collecting 100 samples in estimated 5.1050 
s (91k iterations)
   Benchmarking int64/max nullable: Analyzing
   int64/max nullable      time:   [56.167 µs 56.277 µs 56.382 µs]
                           thrpt:  [8.6603 GiB/s 8.6764 GiB/s 8.6934 GiB/s]
                    change:
                           time:   [+102.92% +103.74% +104.42%] (p = 0.00 < 
0.05)
                           thrpt:  [-51.082% -50.919% -50.720%]
                           Performance has regressed.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) low severe
   
   Benchmarking string/min nonnull
   Benchmarking string/min nonnull: Warming up for 3.0000 s
   Benchmarking string/min nonnull: Collecting 100 samples in estimated 5.0352 
s (30k iterations)
   Benchmarking string/min nonnull: Analyzing
   string/min nonnull      time:   [155.50 µs 156.52 µs 157.37 µs]
                           thrpt:  [416.44 Melem/s 418.70 Melem/s 421.46 
Melem/s]
                    change:
                           time:   [+23.742% +25.113% +26.286%] (p = 0.00 < 
0.05)
                           thrpt:  [-20.815% -20.072% -19.186%]
                           Performance has regressed.
   Found 22 outliers among 100 measurements (22.00%)
     19 (19.00%) low severe
     2 (2.00%) high mild
     1 (1.00%) high severe
   Benchmarking string/max nonnull
   Benchmarking string/max nonnull: Warming up for 3.0000 s
   Benchmarking string/max nonnull: Collecting 100 samples in estimated 5.5497 
s (35k iterations)
   Benchmarking string/max nonnull: Analyzing
   string/max nonnull      time:   [156.72 µs 157.11 µs 157.45 µs]
                           thrpt:  [416.24 Melem/s 417.12 Melem/s 418.16 
Melem/s]
                    change:
                           time:   [+11.214% +11.736% +12.164%] (p = 0.00 < 
0.05)
                           thrpt:  [-10.844% -10.504% -10.083%]
                           Performance has regressed.
   Found 6 outliers among 100 measurements (6.00%)
     4 (4.00%) low severe
     1 (1.00%) low mild
     1 (1.00%) high mild
   Benchmarking string/min nullable
   Benchmarking string/min nullable: Warming up for 3.0000 s
   Benchmarking string/min nullable: Collecting 100 samples in estimated 5.0964 
s (45k iterations)
   Benchmarking string/min nullable: Analyzing
   string/min nullable     time:   [111.77 µs 112.19 µs 112.73 µs]
                           thrpt:  [581.36 Melem/s 584.15 Melem/s 586.33 
Melem/s]
                    change:
                           time:   [+27.574% +27.967% +28.391%] (p = 0.00 < 
0.05)
                           thrpt:  [-22.113% -21.855% -21.614%]
                           Performance has regressed.
   Found 4 outliers among 100 measurements (4.00%)
     1 (1.00%) low severe
     1 (1.00%) low mild
     1 (1.00%) high mild
     1 (1.00%) high severe
   Benchmarking string/max nullable
   Benchmarking string/max nullable: Warming up for 3.0000 s
   Benchmarking string/max nullable: Collecting 100 samples in estimated 5.5168 
s (50k iterations)
   Benchmarking string/max nullable: Analyzing
   string/max nullable     time:   [108.08 µs 108.89 µs 109.66 µs]
                           thrpt:  [597.64 Melem/s 601.87 Melem/s 606.39 
Melem/s]
                    change:
                           time:   [+26.078% +26.963% +27.772%] (p = 0.00 < 
0.05)
                           thrpt:  [-21.736% -21.237% -20.684%]
                           Performance has regressed.
   Found 11 outliers among 100 measurements (11.00%)
     8 (8.00%) low severe
     2 (2.00%) high mild
     1 (1.00%) high severe
   ```
   
   </p>
   </details> 
   
   
   ## `master` @ 61da64a0557c80af5bb43b5f15c6d8bb6a314cb2 with `simd` (nightly 
Rust) vs branch (stable Rust 1.73)
   
   <details><summary>Details</summary>
   <p>
   
   ```
        Running benches/aggregate_kernels.rs 
(target/release/deps/aggregate_kernels-9282db2d205ca86c)
   Benchmarking float32/sum nonnull
   Benchmarking float32/sum nonnull: Warming up for 3.0000 s
   Benchmarking float32/sum nonnull: Collecting 100 samples in estimated 5.0114 
s (808k iterations)
   Benchmarking float32/sum nonnull: Analyzing
   float32/sum nonnull     time:   [6.2821 µs 6.3453 µs 6.4165 µs]
                           thrpt:  [38.049 GiB/s 38.476 GiB/s 38.863 GiB/s]
                    change:
                           time:   [+61.648% +63.000% +64.486%] (p = 0.00 < 
0.05)
                           thrpt:  [-39.205% -38.650% -38.137%]
                           Performance has regressed.
   Found 16 outliers among 100 measurements (16.00%)
     2 (2.00%) high mild
     14 (14.00%) high severe
   Benchmarking float32/min nonnull
   Benchmarking float32/min nonnull: Warming up for 3.0000 s
   Benchmarking float32/min nonnull: Collecting 100 samples in estimated 5.1040 
s (237k iterations)
   Benchmarking float32/min nonnull: Analyzing
   float32/min nonnull     time:   [20.695 µs 20.741 µs 20.796 µs]
                           thrpt:  [11.740 GiB/s 11.771 GiB/s 11.797 GiB/s]
                    change:
                           time:   [+65.066% +66.458% +68.206%] (p = 0.00 < 
0.05)
                           thrpt:  [-40.549% -39.925% -39.418%]
                           Performance has regressed.
   Found 13 outliers among 100 measurements (13.00%)
     7 (7.00%) high mild
     6 (6.00%) high severe
   Benchmarking float32/max nonnull
   Benchmarking float32/max nonnull: Warming up for 3.0000 s
   Benchmarking float32/max nonnull: Collecting 100 samples in estimated 5.0635 
s (247k iterations)
   Benchmarking float32/max nonnull: Analyzing
   float32/max nonnull     time:   [20.303 µs 20.329 µs 20.360 µs]
                           thrpt:  [11.991 GiB/s 12.009 GiB/s 12.025 GiB/s]
                    change:
                           time:   [+133.51% +134.13% +134.98%] (p = 0.00 < 
0.05)
                           thrpt:  [-57.443% -57.290% -57.176%]
                           Performance has regressed.
   Found 9 outliers among 100 measurements (9.00%)
     4 (4.00%) high mild
     5 (5.00%) high severe
   Benchmarking float32/sum nullable
   Benchmarking float32/sum nullable: Warming up for 3.0000 s
   Benchmarking float32/sum nullable: Collecting 100 samples in estimated 
5.0128 s (475k iterations)
   Benchmarking float32/sum nullable: Analyzing
   float32/sum nullable    time:   [10.589 µs 10.637 µs 10.699 µs]
                           thrpt:  [22.818 GiB/s 22.951 GiB/s 23.056 GiB/s]
                    change:
                           time:   [+78.695% +79.646% +80.680%] (p = 0.00 < 
0.05)
                           thrpt:  [-44.654% -44.335% -44.039%]
                           Performance has regressed.
   Found 17 outliers among 100 measurements (17.00%)
     1 (1.00%) low mild
     4 (4.00%) high mild
     12 (12.00%) high severe
   Benchmarking float32/min nullable
   Benchmarking float32/min nullable: Warming up for 3.0000 s
   Benchmarking float32/min nullable: Collecting 100 samples in estimated 
5.1710 s (106k iterations)
   Benchmarking float32/min nullable: Analyzing
   float32/min nullable    time:   [48.715 µs 48.766 µs 48.822 µs]
                           thrpt:  [5.0007 GiB/s 5.0064 GiB/s 5.0116 GiB/s]
                    change:
                           time:   [+33.507% +33.944% +34.497%] (p = 0.00 < 
0.05)
                           thrpt:  [-25.649% -25.342% -25.098%]
                           Performance has regressed.
   Found 9 outliers among 100 measurements (9.00%)
     4 (4.00%) high mild
     5 (5.00%) high severe
   Benchmarking float32/max nullable
   Benchmarking float32/max nullable: Warming up for 3.0000 s
   Benchmarking float32/max nullable: Collecting 100 samples in estimated 
5.0798 s (101k iterations)
   Benchmarking float32/max nullable: Analyzing
   float32/max nullable    time:   [48.964 µs 49.221 µs 49.548 µs]
                           thrpt:  [4.9273 GiB/s 4.9601 GiB/s 4.9862 GiB/s]
                    change:
                           time:   [+50.526% +51.990% +53.631%] (p = 0.00 < 
0.05)
                           thrpt:  [-34.909% -34.206% -33.566%]
                           Performance has regressed.
   Found 15 outliers among 100 measurements (15.00%)
     4 (4.00%) high mild
     11 (11.00%) high severe
   
   Benchmarking float64/sum nonnull
   Benchmarking float64/sum nonnull: Warming up for 3.0000 s
   Benchmarking float64/sum nonnull: Collecting 100 samples in estimated 5.0237 
s (439k iterations)
   Benchmarking float64/sum nonnull: Analyzing
   float64/sum nonnull     time:   [11.421 µs 11.552 µs 11.708 µs]
                           thrpt:  [41.704 GiB/s 42.269 GiB/s 42.751 GiB/s]
                    change:
                           time:   [+47.672% +49.203% +50.975%] (p = 0.00 < 
0.05)
                           thrpt:  [-33.764% -32.977% -32.283%]
                           Performance has regressed.
   Found 11 outliers among 100 measurements (11.00%)
     1 (1.00%) high mild
     10 (10.00%) high severe
   Benchmarking float64/min nonnull
   Benchmarking float64/min nonnull: Warming up for 3.0000 s
   Benchmarking float64/min nonnull: Collecting 100 samples in estimated 5.0197 
s (86k iterations)
   Benchmarking float64/min nonnull: Analyzing
   float64/min nonnull     time:   [57.389 µs 57.690 µs 58.120 µs]
                           thrpt:  [8.4013 GiB/s 8.4638 GiB/s 8.5083 GiB/s]
                    change:
                           time:   [+204.63% +209.91% +214.99%] (p = 0.00 < 
0.05)
                           thrpt:  [-68.253% -67.732% -67.173%]
                           Performance has regressed.
   Found 19 outliers among 100 measurements (19.00%)
     1 (1.00%) high mild
     18 (18.00%) high severe
   Benchmarking float64/max nonnull
   Benchmarking float64/max nonnull: Warming up for 3.0000 s
   Benchmarking float64/max nonnull: Collecting 100 samples in estimated 5.1424 
s (126k iterations)
   Benchmarking float64/max nonnull: Analyzing
   float64/max nonnull     time:   [40.828 µs 40.897 µs 40.968 µs]
                           thrpt:  [11.919 GiB/s 11.939 GiB/s 11.959 GiB/s]
                    change:
                           time:   [+216.42% +218.01% +219.31%] (p = 0.00 < 
0.05)
                           thrpt:  [-68.683% -68.555% -68.397%]
                           Performance has regressed.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high severe
   Benchmarking float64/sum nullable
   Benchmarking float64/sum nullable: Warming up for 3.0000 s
   Benchmarking float64/sum nullable: Collecting 100 samples in estimated 
5.0847 s (227k iterations)
   Benchmarking float64/sum nullable: Analyzing
   float64/sum nullable    time:   [22.622 µs 22.714 µs 22.811 µs]
                           thrpt:  [21.405 GiB/s 21.497 GiB/s 21.585 GiB/s]
                    change:
                           time:   [+161.10% +162.73% +164.31%] (p = 0.00 < 
0.05)
                           thrpt:  [-62.165% -61.938% -61.701%]
                           Performance has regressed.
   Found 9 outliers among 100 measurements (9.00%)
     1 (1.00%) low mild
     7 (7.00%) high mild
     1 (1.00%) high severe
   Benchmarking float64/min nullable
   Benchmarking float64/min nullable: Warming up for 3.0000 s
   Benchmarking float64/min nullable: Collecting 100 samples in estimated 
5.0221 s (50k iterations)
   Benchmarking float64/min nullable: Analyzing
   float64/min nullable    time:   [98.731 µs 99.523 µs 100.38 µs]
                           thrpt:  [4.8643 GiB/s 4.9062 GiB/s 4.9456 GiB/s]
                    change:
                           time:   [+173.41% +175.20% +176.95%] (p = 0.00 < 
0.05)
                           thrpt:  [-63.892% -63.663% -63.425%]
                           Performance has regressed.
   Found 7 outliers among 100 measurements (7.00%)
     5 (5.00%) high mild
     2 (2.00%) high severe
   Benchmarking float64/max nullable
   Benchmarking float64/max nullable: Warming up for 3.0000 s
   Benchmarking float64/max nullable: Collecting 100 samples in estimated 
5.0448 s (50k iterations)
   Benchmarking float64/max nullable: Analyzing
   float64/max nullable    time:   [98.333 µs 98.718 µs 99.144 µs]
                           thrpt:  [4.9250 GiB/s 4.9462 GiB/s 4.9656 GiB/s]
                    change:
                           time:   [+225.06% +226.85% +228.72%] (p = 0.00 < 
0.05)
                           thrpt:  [-69.579% -69.405% -69.236%]
                           Performance has regressed.
   Found 4 outliers among 100 measurements (4.00%)
     2 (2.00%) high mild
     2 (2.00%) high severe
   
   Benchmarking int8/sum nonnull
   Benchmarking int8/sum nonnull: Warming up for 3.0000 s
   Benchmarking int8/sum nonnull: Collecting 100 samples in estimated 5.0006 s 
(9.3M iterations)
   Benchmarking int8/sum nonnull: Analyzing
   int8/sum nonnull        time:   [538.87 ns 540.39 ns 542.10 ns]
                           thrpt:  [112.59 GiB/s 112.95 GiB/s 113.27 GiB/s]
                    change:
                           time:   [+5.8410% +6.3350% +6.7450%] (p = 0.00 < 
0.05)
                           thrpt:  [-6.3188% -5.9576% -5.5187%]
                           Performance has regressed.
   Found 8 outliers among 100 measurements (8.00%)
     6 (6.00%) high mild
     2 (2.00%) high severe
   Benchmarking int8/min nonnull
   Benchmarking int8/min nonnull: Warming up for 3.0000 s
   Benchmarking int8/min nonnull: Collecting 100 samples in estimated 5.0010 s 
(9.2M iterations)
   Benchmarking int8/min nonnull: Analyzing
   int8/min nonnull        time:   [539.84 ns 540.98 ns 542.21 ns]
                           thrpt:  [112.57 GiB/s 112.82 GiB/s 113.06 GiB/s]
                    change:
                           time:   [-98.907% -98.901% -98.895%] (p = 0.00 < 
0.05)
                           thrpt:  [+8951.4% +8998.4% +9050.6%]
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     2 (2.00%) low severe
     2 (2.00%) high mild
     4 (4.00%) high severe
   Benchmarking int8/max nonnull
   Benchmarking int8/max nonnull: Warming up for 3.0000 s
   Benchmarking int8/max nonnull: Collecting 100 samples in estimated 5.0014 s 
(9.3M iterations)
   Benchmarking int8/max nonnull: Analyzing
   int8/max nonnull        time:   [538.51 ns 539.40 ns 540.38 ns]
                           thrpt:  [112.95 GiB/s 113.15 GiB/s 113.34 GiB/s]
                    change:
                           time:   [-98.893% -98.888% -98.884%] (p = 0.00 < 
0.05)
                           thrpt:  [+8861.5% +8896.5% +8936.3%]
                           Performance has improved.
   Found 9 outliers among 100 measurements (9.00%)
     6 (6.00%) high mild
     3 (3.00%) high severe
   Benchmarking int8/sum nullable
   Benchmarking int8/sum nullable: Warming up for 3.0000 s
   Benchmarking int8/sum nullable: Collecting 100 samples in estimated 5.0027 s 
(702k iterations)
   Benchmarking int8/sum nullable: Analyzing
   int8/sum nullable       time:   [7.0904 µs 7.0965 µs 7.1035 µs]
                           thrpt:  [8.5923 GiB/s 8.6008 GiB/s 8.6082 GiB/s]
                    change:
                           time:   [+97.769% +98.683% +99.370%] (p = 0.00 < 
0.05)
                           thrpt:  [-49.842% -49.669% -49.436%]
                           Performance has regressed.
   Found 7 outliers among 100 measurements (7.00%)
     1 (1.00%) low severe
     4 (4.00%) high mild
     2 (2.00%) high severe
   Benchmarking int8/min nullable
   Benchmarking int8/min nullable: Warming up for 3.0000 s
   Benchmarking int8/min nullable: Collecting 100 samples in estimated 5.0248 s 
(636k iterations)
   Benchmarking int8/min nullable: Analyzing
   int8/min nullable       time:   [7.8877 µs 7.9023 µs 7.9149 µs]
                           thrpt:  [7.7114 GiB/s 7.7237 GiB/s 7.7380 GiB/s]
                    change:
                           time:   [-78.217% -78.112% -78.025%] (p = 0.00 < 
0.05)
                           thrpt:  [+355.06% +356.88% +359.07%]
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     2 (2.00%) low mild
     1 (1.00%) high severe
   Benchmarking int8/max nullable
   Benchmarking int8/max nullable: Warming up for 3.0000 s
   Benchmarking int8/max nullable: Collecting 100 samples in estimated 5.0380 s 
(626k iterations)
   Benchmarking int8/max nullable: Analyzing
   int8/max nullable       time:   [8.0177 µs 8.0409 µs 8.0667 µs]
                           thrpt:  [7.5663 GiB/s 7.5906 GiB/s 7.6125 GiB/s]
                    change:
                           time:   [-77.679% -77.566% -77.463%] (p = 0.00 < 
0.05)
                           thrpt:  [+343.72% +345.75% +348.00%]
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     4 (4.00%) high mild
     1 (1.00%) high severe
   
   Benchmarking int16/sum nonnull
   Benchmarking int16/sum nonnull: Warming up for 3.0000 s
   Benchmarking int16/sum nonnull: Collecting 100 samples in estimated 5.0034 s 
(4.6M iterations)
   Benchmarking int16/sum nonnull: Analyzing
   int16/sum nonnull       time:   [1.0893 µs 1.0911 µs 1.0932 µs]
                           thrpt:  [111.66 GiB/s 111.88 GiB/s 112.07 GiB/s]
                    change:
                           time:   [+4.5498% +5.0884% +5.5881%] (p = 0.00 < 
0.05)
                           thrpt:  [-5.2924% -4.8420% -4.3518%]
                           Performance has regressed.
   Found 3 outliers among 100 measurements (3.00%)
     1 (1.00%) high mild
     2 (2.00%) high severe
   Benchmarking int16/min nonnull
   Benchmarking int16/min nonnull: Warming up for 3.0000 s
   Benchmarking int16/min nonnull: Collecting 100 samples in estimated 5.0053 s 
(4.6M iterations)
   Benchmarking int16/min nonnull: Analyzing
   int16/min nonnull       time:   [1.0906 µs 1.0936 µs 1.0969 µs]
                           thrpt:  [111.29 GiB/s 111.62 GiB/s 111.93 GiB/s]
                    change:
                           time:   [-1.3424% -0.6267% +0.0176%] (p = 0.07 > 
0.05)
                           thrpt:  [-0.0176% +0.6307% +1.3607%]
                           No change in performance detected.
   Found 10 outliers among 100 measurements (10.00%)
     1 (1.00%) low severe
     4 (4.00%) high mild
     5 (5.00%) high severe
   Benchmarking int16/max nonnull
   Benchmarking int16/max nonnull: Warming up for 3.0000 s
   Benchmarking int16/max nonnull: Collecting 100 samples in estimated 5.0037 s 
(4.5M iterations)
   Benchmarking int16/max nonnull: Analyzing
   int16/max nonnull       time:   [1.0981 µs 1.1014 µs 1.1051 µs]
                           thrpt:  [110.46 GiB/s 110.83 GiB/s 111.17 GiB/s]
                    change:
                           time:   [+1.0907% +1.6125% +2.0946%] (p = 0.00 < 
0.05)
                           thrpt:  [-2.0516% -1.5869% -1.0789%]
                           Performance has regressed.
   Found 3 outliers among 100 measurements (3.00%)
     1 (1.00%) low mild
     2 (2.00%) high mild
   Benchmarking int16/sum nullable
   Benchmarking int16/sum nullable: Warming up for 3.0000 s
   Benchmarking int16/sum nullable: Collecting 100 samples in estimated 5.0260 
s (641k iterations)
   Benchmarking int16/sum nullable: Analyzing
   int16/sum nullable      time:   [7.6647 µs 7.6795 µs 7.6975 µs]
                           thrpt:  [15.858 GiB/s 15.896 GiB/s 15.926 GiB/s]
                    change:
                           time:   [+83.613% +85.760% +87.753%] (p = 0.00 < 
0.05)
                           thrpt:  [-46.739% -46.167% -45.538%]
                           Performance has regressed.
   Found 7 outliers among 100 measurements (7.00%)
     4 (4.00%) high mild
     3 (3.00%) high severe
   Benchmarking int16/min nullable
   Benchmarking int16/min nullable: Warming up for 3.0000 s
   Benchmarking int16/min nullable: Collecting 100 samples in estimated 5.0478 
s (389k iterations)
   Benchmarking int16/min nullable: Analyzing
   int16/min nullable      time:   [13.082 µs 13.132 µs 13.184 µs]
                           thrpt:  [9.2592 GiB/s 9.2959 GiB/s 9.3310 GiB/s]
                    change:
                           time:   [+98.383% +99.218% +99.994%] (p = 0.00 < 
0.05)
                           thrpt:  [-49.999% -49.804% -49.593%]
                           Performance has regressed.
   Found 13 outliers among 100 measurements (13.00%)
     1 (1.00%) low severe
     4 (4.00%) low mild
     6 (6.00%) high mild
     2 (2.00%) high severe
   Benchmarking int16/max nullable
   Benchmarking int16/max nullable: Warming up for 3.0000 s
   Benchmarking int16/max nullable: Collecting 100 samples in estimated 5.0205 
s (384k iterations)
   Benchmarking int16/max nullable: Analyzing
   int16/max nullable      time:   [13.091 µs 13.122 µs 13.153 µs]
                           thrpt:  [9.2811 GiB/s 9.3029 GiB/s 9.3244 GiB/s]
                    change:
                           time:   [+91.868% +93.748% +95.419%] (p = 0.00 < 
0.05)
                           thrpt:  [-48.828% -48.387% -47.881%]
                           Performance has regressed.
   Found 12 outliers among 100 measurements (12.00%)
     2 (2.00%) low severe
     1 (1.00%) low mild
     7 (7.00%) high mild
     2 (2.00%) high severe
   
   Benchmarking int32/sum nonnull
   Benchmarking int32/sum nonnull: Warming up for 3.0000 s
   Benchmarking int32/sum nonnull: Collecting 100 samples in estimated 5.0060 s 
(2.0M iterations)
   Benchmarking int32/sum nonnull: Analyzing
   int32/sum nonnull       time:   [2.4976 µs 2.5039 µs 2.5107 µs]
                           thrpt:  [97.242 GiB/s 97.505 GiB/s 97.751 GiB/s]
                    change:
                           time:   [+2.8546% +3.1854% +3.4974%] (p = 0.00 < 
0.05)
                           thrpt:  [-3.3792% -3.0871% -2.7753%]
                           Performance has regressed.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) high mild
     1 (1.00%) high severe
   Benchmarking int32/min nonnull
   Benchmarking int32/min nonnull: Warming up for 3.0000 s
   Benchmarking int32/min nonnull: Collecting 100 samples in estimated 5.0091 s 
(2.0M iterations)
   Benchmarking int32/min nonnull: Analyzing
   int32/min nonnull       time:   [2.4869 µs 2.4930 µs 2.4997 µs]
                           thrpt:  [97.668 GiB/s 97.930 GiB/s 98.169 GiB/s]
                    change:
                           time:   [+1.7370% +2.0986% +2.4353%] (p = 0.00 < 
0.05)
                           thrpt:  [-2.3774% -2.0554% -1.7073%]
                           Performance has regressed.
   Found 11 outliers among 100 measurements (11.00%)
     9 (9.00%) high mild
     2 (2.00%) high severe
   Benchmarking int32/max nonnull
   Benchmarking int32/max nonnull: Warming up for 3.0000 s
   Benchmarking int32/max nonnull: Collecting 100 samples in estimated 5.0063 s 
(2.0M iterations)
   Benchmarking int32/max nonnull: Analyzing
   int32/max nonnull       time:   [2.5015 µs 2.5094 µs 2.5184 µs]
                           thrpt:  [96.944 GiB/s 97.291 GiB/s 97.597 GiB/s]
                    change:
                           time:   [+2.5025% +2.8808% +3.3134%] (p = 0.00 < 
0.05)
                           thrpt:  [-3.2072% -2.8001% -2.4414%]
                           Performance has regressed.
   Found 7 outliers among 100 measurements (7.00%)
     6 (6.00%) high mild
     1 (1.00%) high severe
   Benchmarking int32/sum nullable
   Benchmarking int32/sum nullable: Warming up for 3.0000 s
   Benchmarking int32/sum nullable: Collecting 100 samples in estimated 5.0198 
s (561k iterations)
   Benchmarking int32/sum nullable: Analyzing
   int32/sum nullable      time:   [8.9102 µs 8.9258 µs 8.9431 µs]
                           thrpt:  [27.299 GiB/s 27.352 GiB/s 27.400 GiB/s]
                    change:
                           time:   [+91.210% +91.759% +92.305%] (p = 0.00 < 
0.05)
                           thrpt:  [-47.999% -47.851% -47.701%]
                           Performance has regressed.
   Found 9 outliers among 100 measurements (9.00%)
     5 (5.00%) high mild
     4 (4.00%) high severe
   Benchmarking int32/min nullable
   Benchmarking int32/min nullable: Warming up for 3.0000 s
   Benchmarking int32/min nullable: Collecting 100 samples in estimated 5.1210 
s (172k iterations)
   Benchmarking int32/min nullable: Analyzing
   int32/min nullable      time:   [29.749 µs 29.815 µs 29.885 µs]
                           thrpt:  [8.1693 GiB/s 8.1885 GiB/s 8.2065 GiB/s]
                    change:
                           time:   [+14.072% +14.457% +14.836%] (p = 0.00 < 
0.05)
                           thrpt:  [-12.919% -12.631% -12.336%]
                           Performance has regressed.
   Found 13 outliers among 100 measurements (13.00%)
     10 (10.00%) high mild
     3 (3.00%) high severe
   Benchmarking int32/max nullable
   Benchmarking int32/max nullable: Warming up for 3.0000 s
   Benchmarking int32/max nullable: Collecting 100 samples in estimated 5.1090 
s (172k iterations)
   Benchmarking int32/max nullable: Analyzing
   int32/max nullable      time:   [29.923 µs 30.000 µs 30.072 µs]
                           thrpt:  [8.1185 GiB/s 8.1381 GiB/s 8.1588 GiB/s]
                    change:
                           time:   [+14.095% +14.522% +14.892%] (p = 0.00 < 
0.05)
                           thrpt:  [-12.962% -12.681% -12.354%]
                           Performance has regressed.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) low mild
     1 (1.00%) high mild
   
   Benchmarking int64/sum nonnull
   Benchmarking int64/sum nonnull: Warming up for 3.0000 s
   Benchmarking int64/sum nonnull: Collecting 100 samples in estimated 5.0249 s 
(1.0M iterations)
   Benchmarking int64/sum nonnull: Analyzing
   int64/sum nonnull       time:   [4.9775 µs 4.9915 µs 5.0056 µs]
                           thrpt:  [97.547 GiB/s 97.823 GiB/s 98.098 GiB/s]
                    change:
                           time:   [+3.8393% +4.1991% +4.5411%] (p = 0.00 < 
0.05)
                           thrpt:  [-4.3439% -4.0298% -3.6973%]
                           Performance has regressed.
   Found 3 outliers among 100 measurements (3.00%)
     2 (2.00%) high mild
     1 (1.00%) high severe
   Benchmarking int64/min nonnull
   Benchmarking int64/min nonnull: Warming up for 3.0000 s
   Benchmarking int64/min nonnull: Collecting 100 samples in estimated 5.0209 s 
(550k iterations)
   Benchmarking int64/min nonnull: Analyzing
   int64/min nonnull       time:   [9.0822 µs 9.0970 µs 9.1127 µs]
                           thrpt:  [53.582 GiB/s 53.675 GiB/s 53.763 GiB/s]
                    change:
                           time:   [+0.1531% +1.1697% +2.1142%] (p = 0.02 < 
0.05)
                           thrpt:  [-2.0704% -1.1561% -0.1529%]
                           Change within noise threshold.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) high mild
     1 (1.00%) high severe
   Benchmarking int64/max nonnull
   Benchmarking int64/max nonnull: Warming up for 3.0000 s
   Benchmarking int64/max nonnull: Collecting 100 samples in estimated 5.0093 s 
(550k iterations)
   Benchmarking int64/max nonnull: Analyzing
   int64/max nonnull       time:   [9.0871 µs 9.1068 µs 9.1282 µs]
                           thrpt:  [53.491 GiB/s 53.617 GiB/s 53.733 GiB/s]
                    change:
                           time:   [+3.3412% +3.6883% +4.0543%] (p = 0.00 < 
0.05)
                           thrpt:  [-3.8964% -3.5571% -3.2332%]
                           Performance has regressed.
   Found 5 outliers among 100 measurements (5.00%)
     4 (4.00%) high mild
     1 (1.00%) high severe
   Benchmarking int64/sum nullable
   Benchmarking int64/sum nullable: Warming up for 3.0000 s
   Benchmarking int64/sum nullable: Collecting 100 samples in estimated 5.0577 
s (288k iterations)
   Benchmarking int64/sum nullable: Analyzing
   int64/sum nullable      time:   [17.531 µs 17.592 µs 17.653 µs]
                           thrpt:  [27.659 GiB/s 27.756 GiB/s 27.852 GiB/s]
                    change:
                           time:   [+84.485% +86.149% +87.741%] (p = 0.00 < 
0.05)
                           thrpt:  [-46.735% -46.280% -45.795%]
                           Performance has regressed.
   Found 14 outliers among 100 measurements (14.00%)
     10 (10.00%) high mild
     4 (4.00%) high severe
   Benchmarking int64/min nullable
   Benchmarking int64/min nullable: Warming up for 3.0000 s
   Benchmarking int64/min nullable: Collecting 100 samples in estimated 5.1196 
s (116k iterations)
   Benchmarking int64/min nullable: Analyzing
   int64/min nullable      time:   [43.743 µs 43.834 µs 43.935 µs]
                           thrpt:  [11.114 GiB/s 11.139 GiB/s 11.163 GiB/s]
                    change:
                           time:   [+66.879% +67.747% +68.541%] (p = 0.00 < 
0.05)
                           thrpt:  [-40.667% -40.387% -40.076%]
                           Performance has regressed.
   Found 7 outliers among 100 measurements (7.00%)
     2 (2.00%) low mild
     3 (3.00%) high mild
     2 (2.00%) high severe
   Benchmarking int64/max nullable
   Benchmarking int64/max nullable: Warming up for 3.0000 s
   Benchmarking int64/max nullable: Collecting 100 samples in estimated 5.0448 
s (116k iterations)
   Benchmarking int64/max nullable: Analyzing
   int64/max nullable      time:   [43.566 µs 43.668 µs 43.765 µs]
                           thrpt:  [11.157 GiB/s 11.182 GiB/s 11.208 GiB/s]
                    change:
                           time:   [+67.097% +67.871% +68.564%] (p = 0.00 < 
0.05)
                           thrpt:  [-40.675% -40.431% -40.155%]
                           Performance has regressed.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   Benchmarking string/min nonnull
   Benchmarking string/min nonnull: Warming up for 3.0000 s
   Benchmarking string/min nonnull: Collecting 100 samples in estimated 5.0449 
s (40k iterations)
   Benchmarking string/min nonnull: Analyzing
   string/min nonnull      time:   [124.28 µs 124.53 µs 124.82 µs]
                           thrpt:  [525.03 Melem/s 526.26 Melem/s 527.33 
Melem/s]
                    change:
                           time:   [-1.5237% -0.5664% +0.2709%] (p = 0.23 > 
0.05)
                           thrpt:  [-0.2701% +0.5696% +1.5473%]
                           No change in performance detected.
   Found 8 outliers among 100 measurements (8.00%)
     7 (7.00%) high mild
     1 (1.00%) high severe
   Benchmarking string/max nonnull
   Benchmarking string/max nonnull: Warming up for 3.0000 s
   Benchmarking string/max nonnull: Collecting 100 samples in estimated 5.6891 
s (40k iterations)
   Benchmarking string/max nonnull: Analyzing
   string/max nonnull      time:   [140.92 µs 141.13 µs 141.35 µs]
                           thrpt:  [463.63 Melem/s 464.35 Melem/s 465.05 
Melem/s]
                    change:
                           time:   [+2.2748% +2.9322% +3.4956%] (p = 0.00 < 
0.05)
                           thrpt:  [-3.3776% -2.8486% -2.2242%]
                           Performance has regressed.
   Found 6 outliers among 100 measurements (6.00%)
     5 (5.00%) high mild
     1 (1.00%) high severe
   Benchmarking string/min nullable
   Benchmarking string/min nullable: Warming up for 3.0000 s
   Benchmarking string/min nullable: Collecting 100 samples in estimated 5.4040 
s (61k iterations)
   Benchmarking string/min nullable: Analyzing
   string/min nullable     time:   [87.851 µs 87.970 µs 88.096 µs]
                           thrpt:  [743.91 Melem/s 744.98 Melem/s 745.99 
Melem/s]
                    change:
                           time:   [+0.1933% +0.4888% +0.8375%] (p = 0.00 < 
0.05)
                           thrpt:  [-0.8306% -0.4864% -0.1929%]
                           Change within noise threshold.
   Found 5 outliers among 100 measurements (5.00%)
     3 (3.00%) high mild
     2 (2.00%) high severe
   Benchmarking string/max nullable
   Benchmarking string/max nullable: Warming up for 3.0000 s
   Benchmarking string/max nullable: Collecting 100 samples in estimated 5.3001 
s (61k iterations)
   Benchmarking string/max nullable: Analyzing
   string/max nullable     time:   [85.671 µs 86.304 µs 87.363 µs]
                           thrpt:  [750.16 Melem/s 759.36 Melem/s 764.97 
Melem/s]
                    change:
                           time:   [-8.1227% -6.5198% -4.9163%] (p = 0.00 < 
0.05)
                           thrpt:  [+5.1705% +6.9746% +8.8408%]
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     4 (4.00%) high mild
     4 (4.00%) high severe
   
   ```
   
   </p>
   </details> 
   
   
   
   <details><summary>Test script</summary>
   <p>
   
   ```bash
   #git merge-base HEAD origin/master
   #61da64a0557c80af5bb43b5f15c6d8bb6a314cb2
   
   #gh pr checkout https://github.com/apache/arrow-rs/pull/5100
   
   echo "***compare using nightly***"
   git checkout 61da64a0557c80af5bb43b5f15c6d8bb6a314cb2
   RUSTFLAGS="-Ctarget-cpu=native" cargo +nightly bench --features=simd --bench 
aggregate_kernels
   gh pr checkout https://github.com/apache/arrow-rs/pull/5100
   RUSTFLAGS="-Ctarget-cpu=native" cargo +nightly bench --features=simd --bench 
aggregate_kernels
   
   echo "*** compare using stable ***"
   git checkout 61da64a0557c80af5bb43b5f15c6d8bb6a314cb2
   RUSTFLAGS="-Ctarget-cpu=native" cargo +nightly bench --features=simd --bench 
aggregate_kernels
   gh pr checkout https://github.com/apache/arrow-rs/pull/5100
   RUSTFLAGS="-Ctarget-cpu=native" cargo +1.73.0 bench --bench aggregate_kernels
   ```
   
   
   
   </p>
   </details> 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to