nevi-me opened a new pull request #7037:
URL: https://github.com/apache/arrow/pull/7037


   This removes the dependency on packed_simd. I initially thought that boolean 
kernels were slower than with explicit SIMD, but this was a false alarm as the 
benchmarks weren't comparing SIMD vs non-SIMD.
   
   While doing this, I noticed that the `divide` kernel appears to be unsound, 
as it checks if a null is 0 (which can be true when the default data behind the 
bitmask is 0).
   
   Below is the performance comparison:
   
   <details>
   <summary>From 0.15.0 to 0.16.0</summary>
   
   ```rust
        Running target/release/deps/arithmetic_kernels-ba6ab3db9f184b40
   add 512                 time:   [15.565 us 15.623 us 15.694 us]
                           change: [-66.359% -66.104% -65.861%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     3 (3.00%) high mild
     1 (1.00%) high severe
   
   add 512 simd            time:   [14.939 us 16.768 us 18.744 us]
                           change: [+1.4006% +6.0795% +11.131%] (p = 0.02 < 
0.05)
                           Performance has regressed.
   Found 9 outliers among 100 measurements (9.00%)
     1 (1.00%) high mild
     8 (8.00%) high severe
   
   subtract 512            time:   [15.659 us 15.727 us 15.799 us]
                           change: [-65.994% -65.847% -65.690%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   subtract 512 simd       time:   [14.003 us 14.119 us 14.284 us]
                           change: [-4.9276% -3.2446% -1.6479%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     2 (2.00%) high mild
     3 (3.00%) high severe
   
   multiply 512            time:   [15.774 us 15.824 us 15.875 us]
                           change: [-65.694% -65.526% -65.352%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   multiply 512 simd       time:   [14.299 us 14.458 us 14.681 us]
                           change: [-0.9771% -0.0444% +0.9882%] (p = 0.93 > 
0.05)
                           No change in performance detected.
   Found 5 outliers among 100 measurements (5.00%)
     2 (2.00%) high mild
     3 (3.00%) high severe
   
   divide 512              time:   [16.690 us 16.731 us 16.774 us]
                           change: [-65.394% -65.012% -64.701%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   
   divide 512 simd         time:   [16.098 us 16.147 us 16.202 us]
                           change: [-3.6005% -2.6939% -1.9439%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     2 (2.00%) high mild
     1 (1.00%) high severe
   
   sum 512 no simd         time:   [7.1888 us 7.2836 us 7.4349 us]
                           change: [-1.2993% -0.2501% +1.2521%] (p = 0.73 > 
0.05)
                           No change in performance detected.
   Found 6 outliers among 100 measurements (6.00%)
     3 (3.00%) high mild
     3 (3.00%) high severe
   
   limit 512, 256 no simd  time:   [6.8801 us 6.9257 us 6.9792 us]
                           change: [-3.8909% -2.7450% -1.6742%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 9 outliers among 100 measurements (9.00%)
     2 (2.00%) high mild
     7 (7.00%) high severe
   
   limit 512, 512 no simd  time:   [6.8552 us 6.9007 us 6.9552 us]
                           change: [-36.783% -31.294% -25.031%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 7 outliers among 100 measurements (7.00%)
     4 (4.00%) high mild
     3 (3.00%) high severe
   
        Running target/release/deps/array_from_vec-9acb1269f64e7733
   array_from_vec 128      time:   [418.62 ns 423.66 ns 430.30 ns]
                           change: [-2.2547% -0.6846% +0.9641%] (p = 0.48 > 
0.05)
                           No change in performance detected.
   Found 8 outliers among 100 measurements (8.00%)
     3 (3.00%) high mild
     5 (5.00%) high severe
   
   array_from_vec 256      time:   [659.91 ns 661.68 ns 663.62 ns]
                           change: [-2.1474% -1.6329% -1.1820%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     2 (2.00%) high mild
     1 (1.00%) high severe
   
   array_from_vec 512      time:   [1.1200 us 1.1244 us 1.1304 us]
                           change: [-2.9911% -2.3466% -1.7654%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) high mild
     1 (1.00%) high severe
   
        Running target/release/deps/boolean_kernels-25e7d12fe4fd7f63
   and                     time:   [51.779 us 51.928 us 52.109 us]
                           change: [-0.4891% -0.0148% +0.4579%] (p = 0.95 > 
0.05)
                           No change in performance detected.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   and simd                time:   [10.417 us 10.561 us 10.831 us]
                           change: [-5.4340% -4.3339% -2.6810%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     1 (1.00%) low mild
     2 (2.00%) high mild
     3 (3.00%) high severe
   
   or                      time:   [52.372 us 52.663 us 52.978 us]
                           change: [-1.0637% -0.3796% +0.3087%] (p = 0.30 > 
0.05)
                           No change in performance detected.
   Found 8 outliers among 100 measurements (8.00%)
     6 (6.00%) high mild
     2 (2.00%) high severe
   
   or simd                 time:   [10.330 us 10.366 us 10.404 us]
                           change: [-9.4316% -7.8623% -6.4004%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     2 (2.00%) high mild
     1 (1.00%) high severe
   
   not                     time:   [28.368 us 28.506 us 28.684 us]
                           change: [-1.4424% -0.5625% +0.4723%] (p = 0.25 > 
0.05)
                           No change in performance detected.
   Found 4 outliers among 100 measurements (4.00%)
     2 (2.00%) high mild
     2 (2.00%) high severe
   
   not simd                time:   [5.3160 us 5.3966 us 5.5020 us]
                           change: [-3.9861% -3.2280% -2.1942%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     3 (3.00%) high mild
     2 (2.00%) high severe
   
        Running target/release/deps/builder-3c9f08ea07165746
   bench_primitive         time:   [3.8598 ms 3.8751 ms 3.8926 ms]
                           thrpt:  [1.0035 GiB/s 1.0080 GiB/s 1.0120 GiB/s]
                    change:
                           time:   [-5.4645% -3.0955% -1.0229%] (p = 0.00 < 
0.05)
                           thrpt:  [+1.0334% +3.1944% +5.7803%]
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     3 (3.00%) high mild
     1 (1.00%) high severe
   
   bench_bool              time:   [2.5218 ms 2.5568 ms 2.6091 ms]
                           thrpt:  [191.64 MiB/s 195.55 MiB/s 198.27 MiB/s]
                    change:
                           time:   [-4.0174% -3.2203% -2.2971%] (p = 0.00 < 
0.05)
                           thrpt:  [+2.3511% +3.3275% +4.1855%]
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     2 (2.00%) high mild
     3 (3.00%) high severe
   
        Running target/release/deps/cast_kernels-28d78edc8dd97880
   cast int32 to int32 512 time:   [382.90 ns 385.84 ns 389.83 ns]
                           change: [+12.520% +19.250% +27.267%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   
   cast int32 to uint32 512
                           time:   [14.323 us 14.362 us 14.403 us]
                           change: [-2.6982% -2.1982% -1.7082%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) high mild
     1 (1.00%) high severe
   
   cast int32 to float32 512
                           time:   [14.892 us 15.000 us 15.112 us]
                           change: [-0.1973% +0.3037% +0.8193%] (p = 0.26 > 
0.05)
                           No change in performance detected.
   Found 3 outliers among 100 measurements (3.00%)
     1 (1.00%) high mild
     2 (2.00%) high severe
   
   cast int32 to float64 512
                           time:   [14.827 us 14.904 us 14.993 us]
                           change: [-3.4069% -2.2322% -1.1900%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) high mild
     1 (1.00%) high severe
   
   cast int32 to int64 512 time:   [14.756 us 14.803 us 14.852 us]
                           change: [-1.8245% -1.2044% -0.5979%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high mild
   
   cast float32 to int32 512
                           time:   [15.831 us 15.953 us 16.136 us]
                           change: [+1.2994% +2.0176% +2.9286%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high severe
   
   cast float64 to float32 512
                           time:   [15.355 us 15.443 us 15.534 us]
                           change: [-0.6370% +0.0148% +0.7769%] (p = 0.97 > 
0.05)
                           No change in performance detected.
   Found 4 outliers among 100 measurements (4.00%)
     3 (3.00%) high mild
     1 (1.00%) high severe
   
   cast float64 to uint64 512
                           time:   [15.283 us 15.339 us 15.402 us]
                           change: [-6.0895% -4.3975% -2.8328%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     2 (2.00%) high mild
     1 (1.00%) high severe
   
   cast int64 to int32 512 time:   [14.008 us 14.053 us 14.102 us]
                           change: [-8.6791% -7.2588% -5.9678%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) high mild
     1 (1.00%) high severe
   
   cast date64 to date32 512
                           time:   [16.473 us 16.673 us 16.943 us]
                           change: [+0.6577% +1.4106% +2.2966%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   Found 7 outliers among 100 measurements (7.00%)
     2 (2.00%) high mild
     5 (5.00%) high severe
   
   cast date32 to date64 512
                           time:   [16.043 us 16.125 us 16.211 us]
                           change: [-1.9078% -1.0437% -0.0086%] (p = 0.02 < 
0.05)
                           Change within noise threshold.
   Found 4 outliers among 100 measurements (4.00%)
     3 (3.00%) high mild
     1 (1.00%) high severe
   
   cast time32s to time32ms 512
                           time:   [1.2209 us 1.2430 us 1.2806 us]
                           change: [-0.2161% +0.8401% +2.0102%] (p = 0.16 > 
0.05)
                           No change in performance detected.
   Found 6 outliers among 100 measurements (6.00%)
     2 (2.00%) high mild
     4 (4.00%) high severe
   
   cast time32s to time64us 512
                           time:   [16.159 us 16.238 us 16.344 us]
                           change: [-2.0200% -1.3127% -0.5458%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) high mild
     1 (1.00%) high severe
   
   cast time64ns to time32s 512
                           time:   [18.420 us 18.485 us 18.558 us]
                           change: [-3.2611% -2.8053% -2.3354%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   
   cast timestamp_ns to timestamp_s 512
                           time:   [464.73 ns 465.98 ns 467.25 ns]
                           change: [+2.4127% +3.5905% +4.5861%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 3 outliers among 100 measurements (3.00%)
     2 (2.00%) high mild
     1 (1.00%) high severe
   
   cast timestamp_ms to timestamp_ns 512
                           time:   [1.8519 us 1.8637 us 1.8805 us]
                           change: [+1.8917% +2.6618% +3.4497%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high severe
   
   cast timestamp_ms to i64 512
                           time:   [620.77 ns 625.18 ns 632.26 ns]
                           change: [+0.3064% +1.3612% +2.6592%] (p = 0.02 < 
0.05)
                           Change within noise threshold.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high severe
   
        Running target/release/deps/comparison_kernels-ac9079b90aba41c8
   eq 512                  time:   [15.227 us 15.269 us 15.314 us]
                           change: [-65.188% -65.051% -64.916%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   eq 512 simd             time:   [16.285 us 16.382 us 16.503 us]
                           change: [-4.4614% -1.4725% +2.5033%] (p = 0.49 > 
0.05)
                           No change in performance detected.
   Found 6 outliers among 100 measurements (6.00%)
     3 (3.00%) high mild
     3 (3.00%) high severe
   
   neq 512                 time:   [15.396 us 15.550 us 15.813 us]
                           change: [-67.464% -66.399% -65.402%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     3 (3.00%) high mild
     1 (1.00%) high severe
   
   neq 512 simd            time:   [16.248 us 16.348 us 16.477 us]
                           change: [-5.7272% -5.0473% -4.3194%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     2 (2.00%) high mild
     2 (2.00%) high severe
   
   lt 512                  time:   [15.544 us 15.617 us 15.705 us]
                           change: [-63.654% -63.364% -63.078%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     3 (3.00%) high mild
     5 (5.00%) high severe
   
   lt 512 simd             time:   [16.309 us 16.502 us 16.796 us]
                           change: [-7.1156% -5.4121% -3.7540%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     1 (1.00%) high mild
     5 (5.00%) high severe
   
   lt_eq 512               time:   [16.197 us 16.797 us 17.577 us]
                           change: [-62.842% -60.475% -57.947%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 16 outliers among 100 measurements (16.00%)
     2 (2.00%) high mild
     14 (14.00%) high severe
   
   lt_eq 512 simd          time:   [16.391 us 16.549 us 16.755 us]
                           change: [-4.1794% -2.5540% -0.5409%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   Found 9 outliers among 100 measurements (9.00%)
     4 (4.00%) high mild
     5 (5.00%) high severe
   
   gt 512                  time:   [15.320 us 15.386 us 15.469 us]
                           change: [-64.783% -64.475% -64.077%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     1 (1.00%) high mild
     4 (4.00%) high severe
   
   gt 512 simd             time:   [16.428 us 16.579 us 16.824 us]
                           change: [-5.8809% -4.9818% -4.0636%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     1 (1.00%) high mild
     3 (3.00%) high severe
   
   gt_eq 512               time:   [15.373 us 15.423 us 15.476 us]
                           change: [-65.439% -65.034% -64.706%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   
   gt_eq 512 simd          time:   [16.248 us 16.405 us 16.662 us]
                           change: [-7.7800% -5.5240% -3.7804%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     3 (3.00%) high mild
     1 (1.00%) high severe
   
        Running target/release/deps/csv_writer-b937777743b12b28
   record_batches_to_csv   time:   [183.57 us 193.17 us 204.93 us]
                           change: [-17.694% -4.8343% +8.9742%] (p = 0.51 > 
0.05)
                           No change in performance detected.
   Found 4 outliers among 100 measurements (4.00%)
     1 (1.00%) high mild
     3 (3.00%) high severe
   
        Running target/release/deps/take_kernels-f3cc4f1980a08edc
   take u8 256             time:   [21.429 us 21.479 us 21.532 us]
                           change: [+4.0275% +4.5102% +5.0289%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 4 outliers among 100 measurements (4.00%)
     4 (4.00%) high mild
   
   take u8 512             time:   [39.899 us 40.042 us 40.204 us]
                           change: [-1.1695% -0.6056% -0.0752%] (p = 0.03 < 
0.05)
                           Change within noise threshold.
   Found 5 outliers among 100 measurements (5.00%)
     5 (5.00%) high mild
   
   take u8 1024            time:   [79.301 us 79.561 us 79.828 us]
                           change: [-1.6327% -1.0495% -0.4431%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   
   take i32 256            time:   [21.631 us 21.722 us 21.818 us]
                           change: [+3.5975% +4.3668% +5.1918%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high severe
   
   take i32 512            time:   [41.232 us 41.427 us 41.642 us]
                           change: [-3.7463% -3.3208% -2.9106%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) high mild
     1 (1.00%) high severe
   
   take i32 1024           time:   [80.877 us 81.279 us 81.730 us]
                           change: [-5.3008% -4.6572% -3.9401%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 9 outliers among 100 measurements (9.00%)
     1 (1.00%) low mild
     5 (5.00%) high mild
     3 (3.00%) high severe
   
   take bool 256           time:   [23.209 us 23.288 us 23.377 us]
                           change: [-3.4634% -2.9723% -2.4941%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     2 (2.00%) high mild
     2 (2.00%) high severe
   
   take bool 512           time:   [45.849 us 46.050 us 46.268 us]
                           change: [-2.0658% -1.3602% -0.7648%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high mild
   
   take bool 1024          time:   [90.650 us 91.065 us 91.501 us]
                           change: [-1.0199% -0.4763% +0.1308%] (p = 0.09 > 
0.05)
                           No change in performance detected.
   Found 6 outliers among 100 measurements (6.00%)
     5 (5.00%) high mild
     1 (1.00%) high severe
   ```
   </details>  
   
   0.16.0 included the change that I made to autovectorize some compute 
kernels. This mainly resulted in non-SIMD kernels having a smaller performance 
gap (from 50-60% slower to 10-20% slower).
   
   <details>
   <summary>From 0.16.0 to no `packed_simd`</summary>
   
   ```rust
        Running target/release/deps/arithmetic_kernels-d263bafe1ecab93d
   add 512                 time:   [16.502 us 16.676 us 16.925 us]
                           change: [+5.7488% +7.2699% +9.9557%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 6 outliers among 100 measurements (6.00%)
     3 (3.00%) high mild
     3 (3.00%) high severe
   
   add 512 simd            time:   [16.679 us 16.836 us 17.066 us]
                           change: [+4.7016% +10.619% +16.149%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 4 outliers among 100 measurements (4.00%)
     2 (2.00%) high mild
     2 (2.00%) high severe
   
   subtract 512            time:   [16.786 us 17.005 us 17.282 us]
                           change: [+6.8124% +7.9200% +9.2584%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 13 outliers among 100 measurements (13.00%)
     8 (8.00%) high mild
     5 (5.00%) high severe
   
   subtract 512 simd       time:   [16.667 us 16.839 us 17.063 us]
                           change: [+17.579% +19.217% +20.952%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 7 outliers among 100 measurements (7.00%)
     1 (1.00%) high mild
     6 (6.00%) high severe
   
   multiply 512            time:   [17.637 us 19.687 us 21.976 us]
                           change: [+7.2304% +12.033% +19.261%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 10 outliers among 100 measurements (10.00%)
     1 (1.00%) high mild
     9 (9.00%) high severe
   
   multiply 512 simd       time:   [16.551 us 16.631 us 16.720 us]
                           change: [+14.169% +15.372% +16.456%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 3 outliers among 100 measurements (3.00%)
     2 (2.00%) high mild
     1 (1.00%) high severe
   
   divide 512              time:   [17.315 us 17.365 us 17.419 us]
                           change: [+3.4441% +3.9221% +4.4103%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) high mild
     1 (1.00%) high severe
   
   divide 512 simd         time:   [17.266 us 17.326 us 17.388 us]
                           change: [+6.7777% +7.2835% +7.7956%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 4 outliers among 100 measurements (4.00%)
     4 (4.00%) high mild
   
   sum 512 no simd         time:   [8.5583 us 8.6700 us 8.8042 us]
                           change: [+17.027% +18.834% +20.593%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 5 outliers among 100 measurements (5.00%)
     3 (3.00%) high mild
     2 (2.00%) high severe
   
   limit 512, 256 no simd  time:   [7.3637 us 7.4095 us 7.4616 us]
                           change: [+5.2888% +6.4103% +7.5328%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high severe
   
   limit 512, 512 no simd  time:   [7.3472 us 7.3736 us 7.4017 us]
                           change: [+5.8054% +6.9044% +7.9027%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 4 outliers among 100 measurements (4.00%)
     3 (3.00%) high mild
     1 (1.00%) high severe
   
        Running target/release/deps/array_from_vec-6e44aa2d195a3b96
   array_from_vec 128      time:   [431.24 ns 433.22 ns 435.41 ns]
                           change: [-0.6088% +1.3213% +2.9253%] (p = 0.15 > 
0.05)
                           No change in performance detected.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   
   array_from_vec 256      time:   [681.05 ns 686.51 ns 694.58 ns]
                           change: [+2.8104% +3.3521% +3.9589%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 3 outliers among 100 measurements (3.00%)
     2 (2.00%) high mild
     1 (1.00%) high severe
   
   array_from_vec 512      time:   [1.1477 us 1.1523 us 1.1576 us]
                           change: [+1.8839% +2.3623% +2.8052%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   
        Running target/release/deps/boolean_kernels-2fceb4e9cf7f69d5
   and                     time:   [49.531 us 49.661 us 49.796 us]
                           change: [-5.3433% -4.8688% -4.3550%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     2 (2.00%) high mild
     3 (3.00%) high severe
   
   and simd                time:   [9.2264 us 9.4280 us 9.7238 us]
                           change: [-12.786% -10.898% -8.7337%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 10 outliers among 100 measurements (10.00%)
     4 (4.00%) high mild
     6 (6.00%) high severe
   
   or                      time:   [49.827 us 50.029 us 50.253 us]
                           change: [-5.2611% -4.5904% -3.9306%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) high mild
     1 (1.00%) high severe
   
   or simd                 time:   [9.1050 us 9.1360 us 9.1669 us]
                           change: [-13.413% -12.471% -11.459%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 7 outliers among 100 measurements (7.00%)
     2 (2.00%) low mild
     4 (4.00%) high mild
     1 (1.00%) high severe
   
   not                     time:   [26.908 us 27.048 us 27.226 us]
                           change: [-5.7470% -4.5050% -3.0267%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 7 outliers among 100 measurements (7.00%)
     4 (4.00%) high mild
     3 (3.00%) high severe
   
   not simd                time:   [4.9087 us 4.9665 us 5.0441 us]
                           change: [-9.5757% -8.4957% -7.3691%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 7 outliers among 100 measurements (7.00%)
     2 (2.00%) high mild
     5 (5.00%) high severe
   
        Running target/release/deps/builder-47e9dfab54e83426
   Benchmarking bench_primitive: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 20.7s or reduce sample count to 30.
   bench_primitive         time:   [4.0583 ms 4.0759 ms 4.0959 ms]
                           thrpt:  [976.58 MiB/s 981.38 MiB/s 985.64 MiB/s]
                    change:
                           time:   [+3.6774% +4.9561% +5.9081%] (p = 0.00 < 
0.05)
                           thrpt:  [-5.5785% -4.7220% -3.5469%]
                           Performance has regressed.
   Found 9 outliers among 100 measurements (9.00%)
     8 (8.00%) high mild
     1 (1.00%) high severe
   
   Benchmarking bench_bool: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 13.3s or reduce sample count to 40.
   bench_bool              time:   [2.6125 ms 2.6395 ms 2.6794 ms]
                           thrpt:  [186.61 MiB/s 189.43 MiB/s 191.39 MiB/s]
                    change:
                           time:   [+2.4849% +3.4680% +4.3829%] (p = 0.00 < 
0.05)
                           thrpt:  [-4.1989% -3.3517% -2.4246%]
                           Performance has regressed.
   Found 4 outliers among 100 measurements (4.00%)
     2 (2.00%) high mild
     2 (2.00%) high severe
   
        Running target/release/deps/cast_kernels-c9770d72fe9b204b
   cast int32 to int32 512 time:   [360.61 ns 363.39 ns 367.77 ns]
                           change: [-25.632% -21.210% -16.494%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high severe
   
   cast int32 to uint32 512
                           time:   [14.567 us 14.603 us 14.645 us]
                           change: [+0.9026% +1.3147% +1.7565%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   Found 4 outliers among 100 measurements (4.00%)
     4 (4.00%) high mild
   
   cast int32 to float32 512
                           time:   [14.972 us 15.117 us 15.275 us]
                           change: [+0.4208% +1.2079% +2.0879%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   Found 8 outliers among 100 measurements (8.00%)
     5 (5.00%) high mild
     3 (3.00%) high severe
   
   cast int32 to float64 512
                           time:   [14.929 us 14.996 us 15.077 us]
                           change: [+0.4568% +0.9965% +1.5083%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   Found 3 outliers among 100 measurements (3.00%)
     2 (2.00%) high mild
     1 (1.00%) high severe
   
   cast int32 to int64 512 time:   [14.880 us 14.920 us 14.961 us]
                           change: [-0.0546% +0.3857% +0.8089%] (p = 0.08 > 
0.05)
                           No change in performance detected.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   cast float32 to int32 512
                           time:   [16.245 us 16.334 us 16.439 us]
                           change: [+1.8067% +2.7560% +3.6900%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 8 outliers among 100 measurements (8.00%)
     4 (4.00%) high mild
     4 (4.00%) high severe
   
   cast float64 to float32 512
                           time:   [15.802 us 15.852 us 15.905 us]
                           change: [+1.9591% +2.7809% +3.4604%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 3 outliers among 100 measurements (3.00%)
     2 (2.00%) high mild
     1 (1.00%) high severe
   
   cast float64 to uint64 512
                           time:   [16.293 us 16.333 us 16.374 us]
                           change: [+6.0229% +6.4724% +6.9067%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   
   cast int64 to int32 512 time:   [14.526 us 14.591 us 14.668 us]
                           change: [+4.2376% +4.7904% +5.2952%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   
   cast date64 to date32 512
                           time:   [16.931 us 17.066 us 17.226 us]
                           change: [+1.0479% +2.0576% +2.9920%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 8 outliers among 100 measurements (8.00%)
     4 (4.00%) high mild
     4 (4.00%) high severe
   
   cast date32 to date64 512
                           time:   [16.583 us 16.648 us 16.713 us]
                           change: [+2.7951% +3.6554% +4.3566%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   cast time32s to time32ms 512
                           time:   [2.1082 us 2.1162 us 2.1253 us]
                           change: [+69.447% +71.604% +73.385%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 7 outliers among 100 measurements (7.00%)
     4 (4.00%) high mild
     3 (3.00%) high severe
   
   cast time32s to time64us 512
                           time:   [17.054 us 17.121 us 17.197 us]
                           change: [+5.1406% +5.8858% +6.5773%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) high mild
     1 (1.00%) high severe
   
   cast time64ns to time32s 512
                           time:   [19.334 us 19.544 us 19.791 us]
                           change: [+5.1059% +6.0233% +6.9969%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 10 outliers among 100 measurements (10.00%)
     4 (4.00%) high mild
     6 (6.00%) high severe
   
   cast timestamp_ns to timestamp_s 512
                           time:   [555.04 ns 560.31 ns 567.19 ns]
                           change: [+35.093% +43.293% +50.432%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   
   cast timestamp_ms to timestamp_ns 512
                           time:   [2.4111 us 2.4192 us 2.4283 us]
                           change: [+29.614% +30.468% +31.257%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 4 outliers among 100 measurements (4.00%)
     2 (2.00%) high mild
     2 (2.00%) high severe
   
   cast timestamp_ms to i64 512
                           time:   [692.68 ns 695.43 ns 698.67 ns]
                           change: [+10.064% +11.531% +12.672%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 4 outliers among 100 measurements (4.00%)
     4 (4.00%) high mild
   
        Running target/release/deps/comparison_kernels-0133fee8f9747e38
   eq 512                  time:   [17.686 us 17.944 us 18.310 us]
                           change: [+14.993% +15.993% +17.578%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 6 outliers among 100 measurements (6.00%)
     2 (2.00%) high mild
     4 (4.00%) high severe
   
   eq 512 simd             time:   [17.859 us 17.937 us 18.016 us]
                           change: [+2.0950% +6.1522% +9.2180%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   neq 512                 time:   [17.483 us 17.547 us 17.625 us]
                           change: [+12.430% +13.250% +13.942%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 5 outliers among 100 measurements (5.00%)
     5 (5.00%) high mild
   
   neq 512 simd            time:   [17.413 us 17.522 us 17.678 us]
                           change: [+6.9202% +8.2899% +10.309%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 3 outliers among 100 measurements (3.00%)
     2 (2.00%) high mild
     1 (1.00%) high severe
   
   lt 512                  time:   [17.710 us 17.893 us 18.162 us]
                           change: [+13.270% +14.421% +15.629%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 4 outliers among 100 measurements (4.00%)
     1 (1.00%) high mild
     3 (3.00%) high severe
   
   lt 512 simd             time:   [17.684 us 17.790 us 17.921 us]
                           change: [+6.6510% +8.3058% +9.8628%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 7 outliers among 100 measurements (7.00%)
     3 (3.00%) high mild
     4 (4.00%) high severe
   
   lt_eq 512               time:   [17.597 us 17.660 us 17.728 us]
                           change: [-7.2662% -0.7635% +5.7716%] (p = 0.83 > 
0.05)
                           No change in performance detected.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high mild
   
   lt_eq 512 simd          time:   [17.625 us 17.713 us 17.819 us]
                           change: [+4.9823% +7.0228% +8.8962%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 4 outliers among 100 measurements (4.00%)
     2 (2.00%) high mild
     2 (2.00%) high severe
   
   gt 512                  time:   [17.603 us 17.890 us 18.252 us]
                           change: [+12.682% +14.159% +15.481%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 6 outliers among 100 measurements (6.00%)
     3 (3.00%) high mild
     3 (3.00%) high severe
   
   gt 512 simd             time:   [17.557 us 17.724 us 17.914 us]
                           change: [+5.7046% +6.8432% +8.2057%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 8 outliers among 100 measurements (8.00%)
     2 (2.00%) high mild
     6 (6.00%) high severe
   
   gt_eq 512               time:   [17.544 us 17.610 us 17.684 us]
                           change: [+14.041% +16.179% +19.229%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 8 outliers among 100 measurements (8.00%)
     4 (4.00%) high mild
     4 (4.00%) high severe
   
   gt_eq 512 simd          time:   [17.749 us 18.011 us 18.395 us]
                           change: [+8.8024% +10.461% +12.459%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 4 outliers among 100 measurements (4.00%)
     1 (1.00%) high mild
     3 (3.00%) high severe
   
        Running target/release/deps/csv_writer-9387f0497783e820
   record_batches_to_csv   time:   [179.16 us 189.32 us 202.58 us]
                           change: [-11.723% -3.6958% +5.6507%] (p = 0.42 > 
0.05)
                           No change in performance detected.
   Found 4 outliers among 100 measurements (4.00%)
     1 (1.00%) high mild
     3 (3.00%) high severe
   
        Running target/release/deps/take_kernels-0b0e071e2159c546
   take u8 256             time:   [21.422 us 21.507 us 21.598 us]
                           change: [-0.4883% -0.0299% +0.4050%] (p = 0.88 > 
0.05)
                           No change in performance detected.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   
   take u8 512             time:   [41.737 us 41.850 us 41.971 us]
                           change: [+3.9654% +4.4028% +4.8348%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high mild
   
   take u8 1024            time:   [83.299 us 84.170 us 85.421 us]
                           change: [+5.2941% +6.1255% +7.3670%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 4 outliers among 100 measurements (4.00%)
     2 (2.00%) high mild
     2 (2.00%) high severe
   
   take i32 256            time:   [21.295 us 21.428 us 21.594 us]
                           change: [-3.7994% -2.8372% -1.8433%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     1 (1.00%) high mild
     4 (4.00%) high severe
   
   take i32 512            time:   [41.618 us 41.774 us 41.956 us]
                           change: [+0.1984% +0.7887% +1.4393%] (p = 0.01 < 
0.05)
                           Change within noise threshold.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) high mild
     1 (1.00%) high severe
   
   take i32 1024           time:   [83.318 us 85.386 us 87.753 us]
                           change: [+2.6187% +4.5585% +6.8482%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 13 outliers among 100 measurements (13.00%)
     1 (1.00%) high mild
     12 (12.00%) high severe
   
   take bool 256           time:   [23.326 us 23.391 us 23.460 us]
                           change: [-0.1620% +0.3110% +0.7507%] (p = 0.19 > 
0.05)
                           No change in performance detected.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) low severe
   
   take bool 512           time:   [47.138 us 47.325 us 47.518 us]
                           change: [+3.1552% +3.8603% +4.7518%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high severe
   
   take bool 1024          time:   [91.152 us 91.459 us 91.783 us]
                           change: [-0.0296% +0.5257% +1.0275%] (p = 0.05 < 
0.05)
                           Change within noise threshold.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high mild
   ```
   </details>  
   
   * Arithmetic kernels are slower by up to 20%
   * Boolean kernels are faster by 5-10%
   * Comparison kernels are slower by up to 8% (ignore the non-simd ones)
   * Cast kernels regress by varying degrees, with a few functions being around 
40% slower. When I wrote the cast kernels, I had to pick the faster options 
when casting temporal types, so I'd need to revisit these to fix the extreme 
perf drops.
   
   ## Are the perf drops worth it?
   
   I suppose it'll boil down to whether getting closer to stable Rust (without 
feature flags) is worth the slight performance drop.
   
   ## Outstanding work to do
   
   - [ ] Remove some benchmarks that become redundant (SIMD vs non-SIMD)
   - [ ] Fix the divide by zero error
   - [ ] Tweak temporal casts to find faster options


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to