nevi-me opened a new pull request #7037: URL: https://github.com/apache/arrow/pull/7037
This removes the dependency on packed_simd. I initially thought that boolean kernels were slower than with explicit SIMD, but this was a false alarm as the benchmarks weren't comparing SIMD vs non-SIMD. While doing this, I noticed that the `divide` kernel appears to be unsound, as it checks if a null is 0 (which can be true when the default data behind the bitmask is 0). Below is the performance comparison: <details> <summary>From 0.15.0 to 0.16.0</summary> ```rust Running target/release/deps/arithmetic_kernels-ba6ab3db9f184b40 add 512 time: [15.565 us 15.623 us 15.694 us] change: [-66.359% -66.104% -65.861%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe add 512 simd time: [14.939 us 16.768 us 18.744 us] change: [+1.4006% +6.0795% +11.131%] (p = 0.02 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 1 (1.00%) high mild 8 (8.00%) high severe subtract 512 time: [15.659 us 15.727 us 15.799 us] change: [-65.994% -65.847% -65.690%] (p = 0.00 < 0.05) Performance has improved. subtract 512 simd time: [14.003 us 14.119 us 14.284 us] change: [-4.9276% -3.2446% -1.6479%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe multiply 512 time: [15.774 us 15.824 us 15.875 us] change: [-65.694% -65.526% -65.352%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild multiply 512 simd time: [14.299 us 14.458 us 14.681 us] change: [-0.9771% -0.0444% +0.9882%] (p = 0.93 > 0.05) No change in performance detected. Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe divide 512 time: [16.690 us 16.731 us 16.774 us] change: [-65.394% -65.012% -64.701%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild divide 512 simd time: [16.098 us 16.147 us 16.202 us] change: [-3.6005% -2.6939% -1.9439%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe sum 512 no simd time: [7.1888 us 7.2836 us 7.4349 us] change: [-1.2993% -0.2501% +1.2521%] (p = 0.73 > 0.05) No change in performance detected. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe limit 512, 256 no simd time: [6.8801 us 6.9257 us 6.9792 us] change: [-3.8909% -2.7450% -1.6742%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) high mild 7 (7.00%) high severe limit 512, 512 no simd time: [6.8552 us 6.9007 us 6.9552 us] change: [-36.783% -31.294% -25.031%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) high mild 3 (3.00%) high severe Running target/release/deps/array_from_vec-9acb1269f64e7733 array_from_vec 128 time: [418.62 ns 423.66 ns 430.30 ns] change: [-2.2547% -0.6846% +0.9641%] (p = 0.48 > 0.05) No change in performance detected. Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) high mild 5 (5.00%) high severe array_from_vec 256 time: [659.91 ns 661.68 ns 663.62 ns] change: [-2.1474% -1.6329% -1.1820%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe array_from_vec 512 time: [1.1200 us 1.1244 us 1.1304 us] change: [-2.9911% -2.3466% -1.7654%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe Running target/release/deps/boolean_kernels-25e7d12fe4fd7f63 and time: [51.779 us 51.928 us 52.109 us] change: [-0.4891% -0.0148% +0.4579%] (p = 0.95 > 0.05) No change in performance detected. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild and simd time: [10.417 us 10.561 us 10.831 us] change: [-5.4340% -4.3339% -2.6810%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 1 (1.00%) low mild 2 (2.00%) high mild 3 (3.00%) high severe or time: [52.372 us 52.663 us 52.978 us] change: [-1.0637% -0.3796% +0.3087%] (p = 0.30 > 0.05) No change in performance detected. Found 8 outliers among 100 measurements (8.00%) 6 (6.00%) high mild 2 (2.00%) high severe or simd time: [10.330 us 10.366 us 10.404 us] change: [-9.4316% -7.8623% -6.4004%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe not time: [28.368 us 28.506 us 28.684 us] change: [-1.4424% -0.5625% +0.4723%] (p = 0.25 > 0.05) No change in performance detected. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) high mild 2 (2.00%) high severe not simd time: [5.3160 us 5.3966 us 5.5020 us] change: [-3.9861% -3.2280% -2.1942%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe Running target/release/deps/builder-3c9f08ea07165746 bench_primitive time: [3.8598 ms 3.8751 ms 3.8926 ms] thrpt: [1.0035 GiB/s 1.0080 GiB/s 1.0120 GiB/s] change: time: [-5.4645% -3.0955% -1.0229%] (p = 0.00 < 0.05) thrpt: [+1.0334% +3.1944% +5.7803%] Performance has improved. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe bench_bool time: [2.5218 ms 2.5568 ms 2.6091 ms] thrpt: [191.64 MiB/s 195.55 MiB/s 198.27 MiB/s] change: time: [-4.0174% -3.2203% -2.2971%] (p = 0.00 < 0.05) thrpt: [+2.3511% +3.3275% +4.1855%] Performance has improved. Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe Running target/release/deps/cast_kernels-28d78edc8dd97880 cast int32 to int32 512 time: [382.90 ns 385.84 ns 389.83 ns] change: [+12.520% +19.250% +27.267%] (p = 0.00 < 0.05) Performance has regressed. cast int32 to uint32 512 time: [14.323 us 14.362 us 14.403 us] change: [-2.6982% -2.1982% -1.7082%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe cast int32 to float32 512 time: [14.892 us 15.000 us 15.112 us] change: [-0.1973% +0.3037% +0.8193%] (p = 0.26 > 0.05) No change in performance detected. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) high mild 2 (2.00%) high severe cast int32 to float64 512 time: [14.827 us 14.904 us 14.993 us] change: [-3.4069% -2.2322% -1.1900%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe cast int32 to int64 512 time: [14.756 us 14.803 us 14.852 us] change: [-1.8245% -1.2044% -0.5979%] (p = 0.00 < 0.05) Change within noise threshold. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild cast float32 to int32 512 time: [15.831 us 15.953 us 16.136 us] change: [+1.2994% +2.0176% +2.9286%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high severe cast float64 to float32 512 time: [15.355 us 15.443 us 15.534 us] change: [-0.6370% +0.0148% +0.7769%] (p = 0.97 > 0.05) No change in performance detected. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe cast float64 to uint64 512 time: [15.283 us 15.339 us 15.402 us] change: [-6.0895% -4.3975% -2.8328%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe cast int64 to int32 512 time: [14.008 us 14.053 us 14.102 us] change: [-8.6791% -7.2588% -5.9678%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe cast date64 to date32 512 time: [16.473 us 16.673 us 16.943 us] change: [+0.6577% +1.4106% +2.2966%] (p = 0.00 < 0.05) Change within noise threshold. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) high mild 5 (5.00%) high severe cast date32 to date64 512 time: [16.043 us 16.125 us 16.211 us] change: [-1.9078% -1.0437% -0.0086%] (p = 0.02 < 0.05) Change within noise threshold. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe cast time32s to time32ms 512 time: [1.2209 us 1.2430 us 1.2806 us] change: [-0.2161% +0.8401% +2.0102%] (p = 0.16 > 0.05) No change in performance detected. Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) high mild 4 (4.00%) high severe cast time32s to time64us 512 time: [16.159 us 16.238 us 16.344 us] change: [-2.0200% -1.3127% -0.5458%] (p = 0.00 < 0.05) Change within noise threshold. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe cast time64ns to time32s 512 time: [18.420 us 18.485 us 18.558 us] change: [-3.2611% -2.8053% -2.3354%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild cast timestamp_ns to timestamp_s 512 time: [464.73 ns 465.98 ns 467.25 ns] change: [+2.4127% +3.5905% +4.5861%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe cast timestamp_ms to timestamp_ns 512 time: [1.8519 us 1.8637 us 1.8805 us] change: [+1.8917% +2.6618% +3.4497%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high severe cast timestamp_ms to i64 512 time: [620.77 ns 625.18 ns 632.26 ns] change: [+0.3064% +1.3612% +2.6592%] (p = 0.02 < 0.05) Change within noise threshold. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high severe Running target/release/deps/comparison_kernels-ac9079b90aba41c8 eq 512 time: [15.227 us 15.269 us 15.314 us] change: [-65.188% -65.051% -64.916%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild eq 512 simd time: [16.285 us 16.382 us 16.503 us] change: [-4.4614% -1.4725% +2.5033%] (p = 0.49 > 0.05) No change in performance detected. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe neq 512 time: [15.396 us 15.550 us 15.813 us] change: [-67.464% -66.399% -65.402%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe neq 512 simd time: [16.248 us 16.348 us 16.477 us] change: [-5.7272% -5.0473% -4.3194%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) high mild 2 (2.00%) high severe lt 512 time: [15.544 us 15.617 us 15.705 us] change: [-63.654% -63.364% -63.078%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) high mild 5 (5.00%) high severe lt 512 simd time: [16.309 us 16.502 us 16.796 us] change: [-7.1156% -5.4121% -3.7540%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 1 (1.00%) high mild 5 (5.00%) high severe lt_eq 512 time: [16.197 us 16.797 us 17.577 us] change: [-62.842% -60.475% -57.947%] (p = 0.00 < 0.05) Performance has improved. Found 16 outliers among 100 measurements (16.00%) 2 (2.00%) high mild 14 (14.00%) high severe lt_eq 512 simd time: [16.391 us 16.549 us 16.755 us] change: [-4.1794% -2.5540% -0.5409%] (p = 0.00 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 4 (4.00%) high mild 5 (5.00%) high severe gt 512 time: [15.320 us 15.386 us 15.469 us] change: [-64.783% -64.475% -64.077%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 1 (1.00%) high mild 4 (4.00%) high severe gt 512 simd time: [16.428 us 16.579 us 16.824 us] change: [-5.8809% -4.9818% -4.0636%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) high mild 3 (3.00%) high severe gt_eq 512 time: [15.373 us 15.423 us 15.476 us] change: [-65.439% -65.034% -64.706%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild gt_eq 512 simd time: [16.248 us 16.405 us 16.662 us] change: [-7.7800% -5.5240% -3.7804%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe Running target/release/deps/csv_writer-b937777743b12b28 record_batches_to_csv time: [183.57 us 193.17 us 204.93 us] change: [-17.694% -4.8343% +8.9742%] (p = 0.51 > 0.05) No change in performance detected. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) high mild 3 (3.00%) high severe Running target/release/deps/take_kernels-f3cc4f1980a08edc take u8 256 time: [21.429 us 21.479 us 21.532 us] change: [+4.0275% +4.5102% +5.0289%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high mild take u8 512 time: [39.899 us 40.042 us 40.204 us] change: [-1.1695% -0.6056% -0.0752%] (p = 0.03 < 0.05) Change within noise threshold. Found 5 outliers among 100 measurements (5.00%) 5 (5.00%) high mild take u8 1024 time: [79.301 us 79.561 us 79.828 us] change: [-1.6327% -1.0495% -0.4431%] (p = 0.00 < 0.05) Change within noise threshold. take i32 256 time: [21.631 us 21.722 us 21.818 us] change: [+3.5975% +4.3668% +5.1918%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe take i32 512 time: [41.232 us 41.427 us 41.642 us] change: [-3.7463% -3.3208% -2.9106%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe take i32 1024 time: [80.877 us 81.279 us 81.730 us] change: [-5.3008% -4.6572% -3.9401%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 1 (1.00%) low mild 5 (5.00%) high mild 3 (3.00%) high severe take bool 256 time: [23.209 us 23.288 us 23.377 us] change: [-3.4634% -2.9723% -2.4941%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) high mild 2 (2.00%) high severe take bool 512 time: [45.849 us 46.050 us 46.268 us] change: [-2.0658% -1.3602% -0.7648%] (p = 0.00 < 0.05) Change within noise threshold. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild take bool 1024 time: [90.650 us 91.065 us 91.501 us] change: [-1.0199% -0.4763% +0.1308%] (p = 0.09 > 0.05) No change in performance detected. Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) high mild 1 (1.00%) high severe ``` </details> 0.16.0 included the change that I made to autovectorize some compute kernels. This mainly resulted in non-SIMD kernels having a smaller performance gap (from 50-60% slower to 10-20% slower). <details> <summary>From 0.16.0 to no `packed_simd`</summary> ```rust Running target/release/deps/arithmetic_kernels-d263bafe1ecab93d add 512 time: [16.502 us 16.676 us 16.925 us] change: [+5.7488% +7.2699% +9.9557%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe add 512 simd time: [16.679 us 16.836 us 17.066 us] change: [+4.7016% +10.619% +16.149%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) high mild 2 (2.00%) high severe subtract 512 time: [16.786 us 17.005 us 17.282 us] change: [+6.8124% +7.9200% +9.2584%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 8 (8.00%) high mild 5 (5.00%) high severe subtract 512 simd time: [16.667 us 16.839 us 17.063 us] change: [+17.579% +19.217% +20.952%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 1 (1.00%) high mild 6 (6.00%) high severe multiply 512 time: [17.637 us 19.687 us 21.976 us] change: [+7.2304% +12.033% +19.261%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) high mild 9 (9.00%) high severe multiply 512 simd time: [16.551 us 16.631 us 16.720 us] change: [+14.169% +15.372% +16.456%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe divide 512 time: [17.315 us 17.365 us 17.419 us] change: [+3.4441% +3.9221% +4.4103%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe divide 512 simd time: [17.266 us 17.326 us 17.388 us] change: [+6.7777% +7.2835% +7.7956%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high mild sum 512 no simd time: [8.5583 us 8.6700 us 8.8042 us] change: [+17.027% +18.834% +20.593%] (p = 0.00 < 0.05) Performance has regressed. Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe limit 512, 256 no simd time: [7.3637 us 7.4095 us 7.4616 us] change: [+5.2888% +6.4103% +7.5328%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high severe limit 512, 512 no simd time: [7.3472 us 7.3736 us 7.4017 us] change: [+5.8054% +6.9044% +7.9027%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe Running target/release/deps/array_from_vec-6e44aa2d195a3b96 array_from_vec 128 time: [431.24 ns 433.22 ns 435.41 ns] change: [-0.6088% +1.3213% +2.9253%] (p = 0.15 > 0.05) No change in performance detected. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild array_from_vec 256 time: [681.05 ns 686.51 ns 694.58 ns] change: [+2.8104% +3.3521% +3.9589%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe array_from_vec 512 time: [1.1477 us 1.1523 us 1.1576 us] change: [+1.8839% +2.3623% +2.8052%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild Running target/release/deps/boolean_kernels-2fceb4e9cf7f69d5 and time: [49.531 us 49.661 us 49.796 us] change: [-5.3433% -4.8688% -4.3550%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe and simd time: [9.2264 us 9.4280 us 9.7238 us] change: [-12.786% -10.898% -8.7337%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) high mild 6 (6.00%) high severe or time: [49.827 us 50.029 us 50.253 us] change: [-5.2611% -4.5904% -3.9306%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe or simd time: [9.1050 us 9.1360 us 9.1669 us] change: [-13.413% -12.471% -11.459%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) low mild 4 (4.00%) high mild 1 (1.00%) high severe not time: [26.908 us 27.048 us 27.226 us] change: [-5.7470% -4.5050% -3.0267%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) high mild 3 (3.00%) high severe not simd time: [4.9087 us 4.9665 us 5.0441 us] change: [-9.5757% -8.4957% -7.3691%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) high mild 5 (5.00%) high severe Running target/release/deps/builder-47e9dfab54e83426 Benchmarking bench_primitive: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 20.7s or reduce sample count to 30. bench_primitive time: [4.0583 ms 4.0759 ms 4.0959 ms] thrpt: [976.58 MiB/s 981.38 MiB/s 985.64 MiB/s] change: time: [+3.6774% +4.9561% +5.9081%] (p = 0.00 < 0.05) thrpt: [-5.5785% -4.7220% -3.5469%] Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 8 (8.00%) high mild 1 (1.00%) high severe Benchmarking bench_bool: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 13.3s or reduce sample count to 40. bench_bool time: [2.6125 ms 2.6395 ms 2.6794 ms] thrpt: [186.61 MiB/s 189.43 MiB/s 191.39 MiB/s] change: time: [+2.4849% +3.4680% +4.3829%] (p = 0.00 < 0.05) thrpt: [-4.1989% -3.3517% -2.4246%] Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) high mild 2 (2.00%) high severe Running target/release/deps/cast_kernels-c9770d72fe9b204b cast int32 to int32 512 time: [360.61 ns 363.39 ns 367.77 ns] change: [-25.632% -21.210% -16.494%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high severe cast int32 to uint32 512 time: [14.567 us 14.603 us 14.645 us] change: [+0.9026% +1.3147% +1.7565%] (p = 0.00 < 0.05) Change within noise threshold. Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high mild cast int32 to float32 512 time: [14.972 us 15.117 us 15.275 us] change: [+0.4208% +1.2079% +2.0879%] (p = 0.00 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 5 (5.00%) high mild 3 (3.00%) high severe cast int32 to float64 512 time: [14.929 us 14.996 us 15.077 us] change: [+0.4568% +0.9965% +1.5083%] (p = 0.00 < 0.05) Change within noise threshold. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe cast int32 to int64 512 time: [14.880 us 14.920 us 14.961 us] change: [-0.0546% +0.3857% +0.8089%] (p = 0.08 > 0.05) No change in performance detected. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild cast float32 to int32 512 time: [16.245 us 16.334 us 16.439 us] change: [+1.8067% +2.7560% +3.6900%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe cast float64 to float32 512 time: [15.802 us 15.852 us 15.905 us] change: [+1.9591% +2.7809% +3.4604%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe cast float64 to uint64 512 time: [16.293 us 16.333 us 16.374 us] change: [+6.0229% +6.4724% +6.9067%] (p = 0.00 < 0.05) Performance has regressed. cast int64 to int32 512 time: [14.526 us 14.591 us 14.668 us] change: [+4.2376% +4.7904% +5.2952%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild cast date64 to date32 512 time: [16.931 us 17.066 us 17.226 us] change: [+1.0479% +2.0576% +2.9920%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe cast date32 to date64 512 time: [16.583 us 16.648 us 16.713 us] change: [+2.7951% +3.6554% +4.3566%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild cast time32s to time32ms 512 time: [2.1082 us 2.1162 us 2.1253 us] change: [+69.447% +71.604% +73.385%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) high mild 3 (3.00%) high severe cast time32s to time64us 512 time: [17.054 us 17.121 us 17.197 us] change: [+5.1406% +5.8858% +6.5773%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe cast time64ns to time32s 512 time: [19.334 us 19.544 us 19.791 us] change: [+5.1059% +6.0233% +6.9969%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) high mild 6 (6.00%) high severe cast timestamp_ns to timestamp_s 512 time: [555.04 ns 560.31 ns 567.19 ns] change: [+35.093% +43.293% +50.432%] (p = 0.00 < 0.05) Performance has regressed. cast timestamp_ms to timestamp_ns 512 time: [2.4111 us 2.4192 us 2.4283 us] change: [+29.614% +30.468% +31.257%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) high mild 2 (2.00%) high severe cast timestamp_ms to i64 512 time: [692.68 ns 695.43 ns 698.67 ns] change: [+10.064% +11.531% +12.672%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high mild Running target/release/deps/comparison_kernels-0133fee8f9747e38 eq 512 time: [17.686 us 17.944 us 18.310 us] change: [+14.993% +15.993% +17.578%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) high mild 4 (4.00%) high severe eq 512 simd time: [17.859 us 17.937 us 18.016 us] change: [+2.0950% +6.1522% +9.2180%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild neq 512 time: [17.483 us 17.547 us 17.625 us] change: [+12.430% +13.250% +13.942%] (p = 0.00 < 0.05) Performance has regressed. Found 5 outliers among 100 measurements (5.00%) 5 (5.00%) high mild neq 512 simd time: [17.413 us 17.522 us 17.678 us] change: [+6.9202% +8.2899% +10.309%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe lt 512 time: [17.710 us 17.893 us 18.162 us] change: [+13.270% +14.421% +15.629%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) high mild 3 (3.00%) high severe lt 512 simd time: [17.684 us 17.790 us 17.921 us] change: [+6.6510% +8.3058% +9.8628%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) high mild 4 (4.00%) high severe lt_eq 512 time: [17.597 us 17.660 us 17.728 us] change: [-7.2662% -0.7635% +5.7716%] (p = 0.83 > 0.05) No change in performance detected. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild lt_eq 512 simd time: [17.625 us 17.713 us 17.819 us] change: [+4.9823% +7.0228% +8.8962%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) high mild 2 (2.00%) high severe gt 512 time: [17.603 us 17.890 us 18.252 us] change: [+12.682% +14.159% +15.481%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe gt 512 simd time: [17.557 us 17.724 us 17.914 us] change: [+5.7046% +6.8432% +8.2057%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 2 (2.00%) high mild 6 (6.00%) high severe gt_eq 512 time: [17.544 us 17.610 us 17.684 us] change: [+14.041% +16.179% +19.229%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe gt_eq 512 simd time: [17.749 us 18.011 us 18.395 us] change: [+8.8024% +10.461% +12.459%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) high mild 3 (3.00%) high severe Running target/release/deps/csv_writer-9387f0497783e820 record_batches_to_csv time: [179.16 us 189.32 us 202.58 us] change: [-11.723% -3.6958% +5.6507%] (p = 0.42 > 0.05) No change in performance detected. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) high mild 3 (3.00%) high severe Running target/release/deps/take_kernels-0b0e071e2159c546 take u8 256 time: [21.422 us 21.507 us 21.598 us] change: [-0.4883% -0.0299% +0.4050%] (p = 0.88 > 0.05) No change in performance detected. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild take u8 512 time: [41.737 us 41.850 us 41.971 us] change: [+3.9654% +4.4028% +4.8348%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild take u8 1024 time: [83.299 us 84.170 us 85.421 us] change: [+5.2941% +6.1255% +7.3670%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) high mild 2 (2.00%) high severe take i32 256 time: [21.295 us 21.428 us 21.594 us] change: [-3.7994% -2.8372% -1.8433%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 1 (1.00%) high mild 4 (4.00%) high severe take i32 512 time: [41.618 us 41.774 us 41.956 us] change: [+0.1984% +0.7887% +1.4393%] (p = 0.01 < 0.05) Change within noise threshold. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe take i32 1024 time: [83.318 us 85.386 us 87.753 us] change: [+2.6187% +4.5585% +6.8482%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 1 (1.00%) high mild 12 (12.00%) high severe take bool 256 time: [23.326 us 23.391 us 23.460 us] change: [-0.1620% +0.3110% +0.7507%] (p = 0.19 > 0.05) No change in performance detected. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) low severe take bool 512 time: [47.138 us 47.325 us 47.518 us] change: [+3.1552% +3.8603% +4.7518%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe take bool 1024 time: [91.152 us 91.459 us 91.783 us] change: [-0.0296% +0.5257% +1.0275%] (p = 0.05 < 0.05) Change within noise threshold. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild ``` </details> * Arithmetic kernels are slower by up to 20% * Boolean kernels are faster by 5-10% * Comparison kernels are slower by up to 8% (ignore the non-simd ones) * Cast kernels regress by varying degrees, with a few functions being around 40% slower. When I wrote the cast kernels, I had to pick the faster options when casting temporal types, so I'd need to revisit these to fix the extreme perf drops. ## Are the perf drops worth it? I suppose it'll boil down to whether getting closer to stable Rust (without feature flags) is worth the slight performance drop. ## Outstanding work to do - [ ] Remove some benchmarks that become redundant (SIMD vs non-SIMD) - [ ] Fix the divide by zero error - [ ] Tweak temporal casts to find faster options ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org