jhorstmann commented on PR #5100:
URL: https://github.com/apache/arrow-rs/pull/5100#issuecomment-1817954032
Benchmarks on `1.76.0-nightly (6790a5127 2023-11-10)`, against master
(#61da64a055) with `simd` feature:
Some regressions on nullable aggregation for float32/float64/int32, but
throughput for them is still in the 40-68 GiB/s range with data in caches.
Large regression for nullable sum of int8, which did not get optimized properly
by llvm.
```
float32/sum nonnull time: [1.7372 µs 1.7402 µs 1.7440 µs]
thrpt: [139.99 GiB/s 140.30 GiB/s 140.54 GiB/s]
change:
time: [-50.559% -50.500% -50.436%] (p = 0.00 <
0.05)
thrpt: [+101.76% +102.02% +102.26%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low severe
1 (1.00%) low mild
4 (4.00%) high mild
2 (2.00%) high severe
float32/min nonnull time: [4.0109 µs 4.0130 µs 4.0154 µs]
thrpt: [60.801 GiB/s 60.838 GiB/s 60.869 GiB/s]
change:
time: [-24.692% -24.613% -24.543%] (p = 0.00 <
0.05)
thrpt: [+32.527% +32.648% +32.788%]
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
float32/max nonnull time: [4.0244 µs 4.0273 µs 4.0309 µs]
thrpt: [60.567 GiB/s 60.621 GiB/s 60.665 GiB/s]
change:
time: [-13.784% -13.683% -13.584%] (p = 0.00 <
0.05)
thrpt: [+15.719% +15.851% +15.988%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
5 (5.00%) high mild
2 (2.00%) high severe
float32/sum nullable time: [6.0198 µs 6.0217 µs 6.0239 µs]
thrpt: [40.529 GiB/s 40.544 GiB/s 40.556 GiB/s]
change:
time: [+70.043% +70.174% +70.289%] (p = 0.00 <
0.05)
thrpt: [-41.276% -41.237% -41.191%]
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low mild
4 (4.00%) high mild
5 (5.00%) high severe
float32/min nullable time: [7.5351 µs 7.5411 µs 7.5494 µs]
thrpt: [32.339 GiB/s 32.374 GiB/s 32.401 GiB/s]
change:
time: [-32.667% -32.546% -32.431%] (p = 0.00 <
0.05)
thrpt: [+47.997% +48.250% +48.516%]
Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
1 (1.00%) low severe
1 (1.00%) low mild
6 (6.00%) high mild
8 (8.00%) high severe
float32/max nullable time: [7.5403 µs 7.5424 µs 7.5448 µs]
thrpt: [32.359 GiB/s 32.369 GiB/s 32.378 GiB/s]
change:
time: [-28.813% -28.771% -28.730%] (p = 0.00 <
0.05)
thrpt: [+40.311% +40.393% +40.475%]
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
float64/sum nonnull time: [3.5015 µs 3.5024 µs 3.5036 µs]
thrpt: [139.37 GiB/s 139.41 GiB/s 139.45 GiB/s]
change:
time: [-50.349% -50.295% -50.252%] (p = 0.00 <
0.05)
thrpt: [+101.01% +101.19% +101.41%]
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe
float64/min nonnull time: [7.9437 µs 7.9478 µs 7.9522 µs]
thrpt: [61.402 GiB/s 61.436 GiB/s 61.467 GiB/s]
change:
time: [-25.173% -25.111% -25.052%] (p = 0.00 <
0.05)
thrpt: [+33.425% +33.531% +33.642%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low mild
4 (4.00%) high mild
2 (2.00%) high severe
float64/max nonnull time: [7.9798 µs 7.9827 µs 7.9859 µs]
thrpt: [61.143 GiB/s 61.167 GiB/s 61.189 GiB/s]
change:
time: [-14.413% -14.310% -14.219%] (p = 0.00 <
0.05)
thrpt: [+16.576% +16.700% +16.840%]
Performance has improved.
float64/sum nullable time: [11.458 µs 11.464 µs 11.472 µs]
thrpt: [42.563 GiB/s 42.594 GiB/s 42.616 GiB/s]
change:
time: [+62.084% +62.247% +62.395%] (p = 0.00 <
0.05)
thrpt: [-38.422% -38.365% -38.304%]
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) high mild
6 (6.00%) high severe
float64/min nullable time: [17.334 µs 17.348 µs 17.368 µs]
thrpt: [28.114 GiB/s 28.146 GiB/s 28.170 GiB/s]
change:
time: [-22.525% -22.443% -22.355%] (p = 0.00 <
0.05)
thrpt: [+28.791% +28.937% +29.073%]
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) high mild
5 (5.00%) high severe
float64/max nullable time: [17.317 µs 17.320 µs 17.325 µs]
thrpt: [28.184 GiB/s 28.192 GiB/s 28.197 GiB/s]
change:
time: [-19.175% -19.109% -19.026%] (p = 0.00 <
0.05)
thrpt: [+23.496% +23.623% +23.724%]
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
3 (3.00%) low mild
2 (2.00%) high mild
9 (9.00%) high severe
int8/sum nonnull time: [294.09 ns 294.29 ns 294.53 ns]
thrpt: [207.23 GiB/s 207.40 GiB/s 207.54 GiB/s]
change:
time: [-5.4940% -5.3941% -5.2948%] (p = 0.00 <
0.05)
thrpt: [+5.5908% +5.7017% +5.8134%]
Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
2 (2.00%) low severe
9 (9.00%) high mild
4 (4.00%) high severe
int8/min nonnull time: [290.35 ns 290.44 ns 290.54 ns]
thrpt: [210.07 GiB/s 210.15 GiB/s 210.21 GiB/s]
change:
time: [-99.378% -99.377% -99.376%] (p = 0.00 <
0.05)
thrpt: [+15927% +15946% +15965%]
Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
8 (8.00%) high mild
5 (5.00%) high severe
int8/max nonnull time: [291.20 ns 291.49 ns 291.86 ns]
thrpt: [209.13 GiB/s 209.39 GiB/s 209.60 GiB/s]
change:
time: [-99.377% -99.376% -99.376%] (p = 0.00 <
0.05)
thrpt: [+15920% +15935% +15948%]
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) high mild
5 (5.00%) high severe
int8/sum nullable time: [26.738 µs 26.753 µs 26.774 µs]
thrpt: [2.2797 GiB/s 2.2814 GiB/s 2.2827 GiB/s]
change:
time: [+991.84% +992.98% +994.02%] (p = 0.00 <
0.05)
thrpt: [-90.859% -90.851% -90.841%]
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
1 (1.00%) low severe
6 (6.00%) high mild
2 (2.00%) high severe
int8/min nullable time: [32.057 µs 32.083 µs 32.117 µs]
thrpt: [1.9004 GiB/s 1.9024 GiB/s 1.9040 GiB/s]
change:
time: [-2.1868% -2.1010% -2.0001%] (p = 0.00 <
0.05)
thrpt: [+2.0409% +2.1461% +2.2357%]
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low severe
1 (1.00%) low mild
2 (2.00%) high mild
2 (2.00%) high severe
int8/max nullable time: [32.065 µs 32.076 µs 32.089 µs]
thrpt: [1.9020 GiB/s 1.9028 GiB/s 1.9035 GiB/s]
change:
time: [-2.1908% -2.0415% -1.9107%] (p = 0.00 <
0.05)
thrpt: [+1.9479% +2.0841% +2.2398%]
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe
int16/sum nonnull time: [586.81 ns 587.25 ns 587.88 ns]
thrpt: [207.65 GiB/s 207.87 GiB/s 208.03 GiB/s]
change:
time: [-8.0350% -7.9283% -7.8161%] (p = 0.00 <
0.05)
thrpt: [+8.4788% +8.6110% +8.7371%]
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low severe
3 (3.00%) high mild
8 (8.00%) high severe
int16/min nonnull time: [581.69 ns 582.17 ns 582.68 ns]
thrpt: [209.50 GiB/s 209.68 GiB/s 209.86 GiB/s]
change:
time: [-17.439% -17.295% -17.181%] (p = 0.00 <
0.05)
thrpt: [+20.745% +20.911% +21.123%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe
int16/max nonnull time: [582.06 ns 582.44 ns 582.89 ns]
thrpt: [209.42 GiB/s 209.58 GiB/s 209.72 GiB/s]
change:
time: [-17.122% -17.038% -16.954%] (p = 0.00 <
0.05)
thrpt: [+20.415% +20.537% +20.660%]
Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
9 (9.00%) high mild
4 (4.00%) high severe
int16/sum nullable time: [3.4535 µs 3.4592 µs 3.4665 µs]
thrpt: [35.214 GiB/s 35.288 GiB/s 35.346 GiB/s]
change:
time: [+33.345% +33.745% +34.165%] (p = 0.00 <
0.05)
thrpt: [-25.465% -25.231% -25.007%]
Performance has regressed.
Found 17 outliers among 100 measurements (17.00%)
4 (4.00%) high mild
13 (13.00%) high severe
int16/min nullable time: [3.8848 µs 3.8877 µs 3.8917 µs]
thrpt: [31.367 GiB/s 31.399 GiB/s 31.423 GiB/s]
change:
time: [-50.105% -50.001% -49.911%] (p = 0.00 <
0.05)
thrpt: [+99.643% +100.01% +100.42%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
int16/max nullable time: [3.8780 µs 3.8787 µs 3.8795 µs]
thrpt: [31.465 GiB/s 31.472 GiB/s 31.478 GiB/s]
change:
time: [-49.958% -49.919% -49.884%] (p = 0.00 <
0.05)
thrpt: [+99.539% +99.676% +99.833%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severe
int32/sum nonnull time: [1.1740 µs 1.1750 µs 1.1762 µs]
thrpt: [207.57 GiB/s 207.77 GiB/s 207.96 GiB/s]
change:
time: [-7.7315% -7.6553% -7.5840%] (p = 0.00 <
0.05)
thrpt: [+8.2064% +8.2899% +8.3793%]
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
6 (6.00%) high mild
3 (3.00%) high severe
int32/min nonnull time: [1.1689 µs 1.1698 µs 1.1709 µs]
thrpt: [208.51 GiB/s 208.70 GiB/s 208.86 GiB/s]
change:
time: [-9.1922% -9.1072% -9.0159%] (p = 0.00 <
0.05)
thrpt: [+9.9093% +10.020% +10.123%]
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe
int32/max nonnull time: [1.1726 µs 1.1735 µs 1.1745 µs]
thrpt: [207.87 GiB/s 208.05 GiB/s 208.20 GiB/s]
change:
time: [-8.8938% -8.7978% -8.7055%] (p = 0.00 <
0.05)
thrpt: [+9.5357% +9.6464% +9.7621%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low mild
3 (3.00%) high mild
3 (3.00%) high severe
int32/sum nullable time: [3.5555 µs 3.5582 µs 3.5623 µs]
thrpt: [68.534 GiB/s 68.613 GiB/s 68.666 GiB/s]
change:
time: [+94.123% +94.291% +94.479%] (p = 0.00 <
0.05)
thrpt: [-48.581% -48.531% -48.486%]
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) low mild
4 (4.00%) high mild
1 (1.00%) high severe
int32/min nullable time: [4.4264 µs 4.4289 µs 4.4326 µs]
thrpt: [55.078 GiB/s 55.124 GiB/s 55.156 GiB/s]
change:
time: [-52.072% -52.020% -51.971%] (p = 0.00 <
0.05)
thrpt: [+108.21% +108.42% +108.65%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low mild
2 (2.00%) high mild
5 (5.00%) high severe
int32/max nullable time: [4.4262 µs 4.4273 µs 4.4286 µs]
thrpt: [55.128 GiB/s 55.144 GiB/s 55.158 GiB/s]
change:
time: [-51.876% -51.846% -51.816%] (p = 0.00 <
0.05)
thrpt: [+107.54% +107.67% +107.80%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low severe
1 (1.00%) low mild
2 (2.00%) high mild
3 (3.00%) high severe
int64/sum nonnull time: [2.4495 µs 2.4505 µs 2.4516 µs]
thrpt: [199.16 GiB/s 199.26 GiB/s 199.34 GiB/s]
change:
time: [-2.7433% -2.6737% -2.5985%] (p = 0.00 <
0.05)
thrpt: [+2.6678% +2.7472% +2.8206%]
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
4 (4.00%) high mild
5 (5.00%) high severe
int64/min nonnull time: [2.4500 µs 2.4507 µs 2.4515 µs]
thrpt: [199.18 GiB/s 199.24 GiB/s 199.30 GiB/s]
change:
time: [-52.711% -52.667% -52.625%] (p = 0.00 <
0.05)
thrpt: [+111.08% +111.27% +111.47%]
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
2 (2.00%) low severe
3 (3.00%) low mild
2 (2.00%) high mild
4 (4.00%) high severe
int64/max nonnull time: [2.4526 µs 2.4547 µs 2.4574 µs]
thrpt: [198.70 GiB/s 198.91 GiB/s 199.09 GiB/s]
change:
time: [-52.667% -52.615% -52.555%] (p = 0.00 <
0.05)
thrpt: [+110.77% +111.04% +111.27%]
Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
1 (1.00%) low mild
9 (9.00%) high mild
8 (8.00%) high severe
int64/sum nullable time: [3.6973 µs 3.6994 µs 3.7020 µs]
thrpt: [131.90 GiB/s 131.99 GiB/s 132.07 GiB/s]
change:
time: [+2.6957% +2.7972% +2.9089%] (p = 0.00 <
0.05)
thrpt: [-2.8266% -2.7211% -2.6249%]
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low severe
1 (1.00%) low mild
2 (2.00%) high mild
2 (2.00%) high severe
int64/min nullable time: [12.353 µs 12.361 µs 12.372 µs]
thrpt: [39.467 GiB/s 39.503 GiB/s 39.529 GiB/s]
change:
time: [-33.195% -33.137% -33.078%] (p = 0.00 <
0.05)
thrpt: [+49.428% +49.560% +49.689%]
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) high mild
2 (2.00%) high severe
int64/max nullable time: [12.350 µs 12.354 µs 12.360 µs]
thrpt: [39.506 GiB/s 39.524 GiB/s 39.538 GiB/s]
change:
time: [-33.160% -33.108% -33.059%] (p = 0.00 <
0.05)
thrpt: [+49.385% +49.495% +49.611%]
Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
1 (1.00%) low severe
9 (9.00%) high mild
3 (3.00%) high severe
string/min nonnull time: [143.39 µs 143.43 µs 143.48 µs]
thrpt: [456.75 Melem/s 456.91 Melem/s 457.06
Melem/s]
change:
time: [+0.7244% +0.8870% +1.0301%] (p = 0.00 <
0.05)
thrpt: [-1.0196% -0.8792% -0.7192%]
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low severe
2 (2.00%) low mild
2 (2.00%) high mild
3 (3.00%) high severe
string/max nonnull time: [142.06 µs 142.14 µs 142.24 µs]
thrpt: [460.74 Melem/s 461.05 Melem/s 461.31
Melem/s]
change:
time: [+0.0195% +0.1450% +0.2623%] (p = 0.02 <
0.05)
thrpt: [-0.2616% -0.1448% -0.0195%]
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
string/min nullable time: [262.80 µs 263.15 µs 263.51 µs]
thrpt: [248.70 Melem/s 249.05 Melem/s 249.38
Melem/s]
change:
time: [+5.3416% +5.5494% +5.7454%] (p = 0.00 <
0.05)
thrpt: [-5.4332% -5.2576% -5.0707%]
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) low mild
2 (2.00%) high mild
string/max nullable time: [277.49 µs 277.74 µs 277.98 µs]
thrpt: [235.75 Melem/s 235.96 Melem/s 236.18
Melem/s]
change:
time: [+2.8718% +3.0640% +3.2428%] (p = 0.00 <
0.05)
thrpt: [-3.1409% -2.9729% -2.7916%]
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low severe
3 (3.00%) low mild
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]