jhorstmann opened a new pull request, #2920:
URL: https://github.com/apache/arrow-rs/pull/2920
# Which issue does this PR close?
<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases. You can
link an issue to this PR using the GitHub syntax. For example `Closes #123`
indicates that this PR will close issue #123.
-->
Closes #2856.
# Rationale for this change
The `packed_simd` crate is no longer actively maintained and now also causes
some issues when run under miri.
Performance numbers are inconclusive. The `sum` aggregation got a bit
faster, but `min`/`max` got slower instead.
# What changes are included in this PR?
<!--
There is no need to duplicate the description in the issue here but it is
sometimes worth providing a summary of the individual changes in this PR.
-->
# Are there any user-facing changes?
- `IntervalMonthDayNanoType` is no longer a numeric type because of missing
support for `i128` lanes in `portable_simd`. The arithmetic kernels on this
type should work the same as before since they did not really rely on the
numeric type and could instead work with any primitive type. Comparison kernels
for this type are no longer implemented though.
- `ArrowFloatNumericType` no longer contains a `pow` method because this is
not yet implemented in `portable_simd`. It seems this was only enabled in the
simd version of the trait, so it is unlikely there are many users.
<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->
<!---
If there are any breaking changes to public APIs, please add the `breaking
change` label.
-->
# Aggregation benchmark results
```
$ RUSTFLAGS="-Ctarget-cpu=native -Copt-level=3" perf record cargo +stage1
bench --features simd,test_utils --bench aggregate_kernels -- --baseline
base-avx512
Finished bench [optimized] target(s) in 48.15s
Running benches/aggregate_kernels.rs
(/home/jhorstmann/Source/github/apache/arrow-rs/target/release/deps/aggregate_kernels-1fb22fbe1680a1da)
float/sum 4096 time: [269.73 ns 269.82 ns 269.93 ns]
thrpt: [56.528 GiB/s 56.552 GiB/s 56.571 GiB/s]
change:
time: [-5.4739% -5.4122% -5.3486%] (p = 0.00 <
0.05)
thrpt: [+5.6508% +5.7219% +5.7909%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
float/min 4096 time: [488.84 ns 488.98 ns 489.15 ns]
thrpt: [31.194 GiB/s 31.205 GiB/s 31.214 GiB/s]
change:
time: [+1.5681% +1.6031% +1.6388%] (p = 0.00 <
0.05)
thrpt: [-1.6124% -1.5778% -1.5439%]
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
1 (1.00%) low mild
4 (4.00%) high mild
9 (9.00%) high severe
float/max 4096 time: [429.91 ns 429.96 ns 430.03 ns]
thrpt: [35.483 GiB/s 35.489 GiB/s 35.493 GiB/s]
change:
time: [+1.1078% +1.1743% +1.2240%] (p = 0.00 <
0.05)
thrpt: [-1.2092% -1.1606% -1.0957%]
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low mild
2 (2.00%) high severe
float/sum nulls 4096 time: [350.29 ns 350.31 ns 350.33 ns]
thrpt: [43.556 GiB/s 43.558 GiB/s 43.560 GiB/s]
change:
time: [-26.221% -26.194% -26.173%] (p = 0.00 <
0.05)
thrpt: [+35.452% +35.490% +35.540%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low mild
3 (3.00%) high mild
3 (3.00%) high severe
float/min nulls 4096 time: [1.2936 µs 1.2939 µs 1.2943 µs]
thrpt: [11.789 GiB/s 11.793 GiB/s 11.795 GiB/s]
change:
time: [+18.173% +18.230% +18.275%] (p = 0.00 <
0.05)
thrpt: [-15.451% -15.419% -15.378%]
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
6 (6.00%) high mild
1 (1.00%) high severe
float/max nulls 4096 time: [1.1110 µs 1.1111 µs 1.1112 µs]
thrpt: [13.732 GiB/s 13.733 GiB/s 13.734 GiB/s]
change:
time: [+3.3113% +3.3532% +3.3820%] (p = 0.00 <
0.05)
thrpt: [-3.2714% -3.2444% -3.2052%]
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) low mild
5 (5.00%) high mild
2 (2.00%) high severe
double/sum 4096 time: [568.14 ns 568.17 ns 568.20 ns]
thrpt: [53.709 GiB/s 53.712 GiB/s 53.715 GiB/s]
change:
time: [-2.0800% -1.9542% -1.8825%] (p = 0.00 <
0.05)
thrpt: [+1.9186% +1.9931% +2.1241%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) low mild
1 (1.00%) high mild
4 (4.00%) high severe
double/min 4096 time: [948.68 ns 948.72 ns 948.76 ns]
thrpt: [32.166 GiB/s 32.167 GiB/s 32.169 GiB/s]
change:
time: [+1.3964% +1.4045% +1.4134%] (p = 0.00 <
0.05)
thrpt: [-1.3937% -1.3850% -1.3772%]
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) low mild
4 (4.00%) high mild
1 (1.00%) high severe
double/max 4096 time: [832.17 ns 832.27 ns 832.40 ns]
thrpt: [36.662 GiB/s 36.668 GiB/s 36.672 GiB/s]
change:
time: [+1.9066% +1.9232% +1.9399%] (p = 0.00 <
0.05)
thrpt: [-1.9029% -1.8869% -1.8709%]
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
4 (4.00%) high mild
5 (5.00%) high severe
double/sum nulls 4096 time: [654.15 ns 654.19 ns 654.24 ns]
thrpt: [46.646 GiB/s 46.649 GiB/s 46.653 GiB/s]
change:
time: [-16.146% -16.137% -16.129%] (p = 0.00 <
0.05)
thrpt: [+19.231% +19.243% +19.255%]
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severe
double/min nulls 4096 time: [2.5084 µs 2.5086 µs 2.5090 µs]
thrpt: [12.163 GiB/s 12.165 GiB/s 12.166 GiB/s]
change:
time: [+18.631% +18.650% +18.672%] (p = 0.00 <
0.05)
thrpt: [-15.734% -15.719% -15.705%]
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
2 (2.00%) low mild
4 (4.00%) high mild
4 (4.00%) high severe
double/max nulls 4096 time: [2.1508 µs 2.1509 µs 2.1510 µs]
thrpt: [14.188 GiB/s 14.189 GiB/s 14.189 GiB/s]
change:
time: [+0.9921% +1.0044% +1.0168%] (p = 0.00 <
0.05)
thrpt: [-1.0065% -0.9945% -0.9824%]
Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
1 (1.00%) low mild
3 (3.00%) high mild
5 (5.00%) high severe
string/min string 4096 time: [11.783 µs 11.786 µs 11.789 µs]
thrpt: [2.5887 GiB/s 2.5894 GiB/s 2.5901 GiB/s]
change:
time: [-0.4596% -0.4101% -0.3598%] (p = 0.00 <
0.05)
thrpt: [+0.3611% +0.4118% +0.4617%]
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
```
The `portable_simd` crate has an internal switch to a bit-based mask
representation when the avx512 target feature is enabled. Since target features
do not apply to the standard library this can only be used via `Zbuild-std` or
by using the crate as a regular dependency. I'm still looking into the
performance with that feature activated (see
https://github.com/rust-lang/portable-simd/issues/312 and
https://github.com/llvm/llvm-project/issues/58546).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]