[GitHub] [arrow-rs] tustvold commented on pull request #1830: Remove simd and avx512 bitwise kernels in favor of autovectorization

GitBox Fri, 10 Jun 2022 09:09:59 -0700


tustvold commented on PR #1830:
URL: https://github.com/apache/arrow-rs/pull/1830#issuecomment-1152520577


   On a Intel Cascade Lake Xeon(R) CPU @ 3.10GHz, specifically a GCP 
c2-standard-16
   
   ### Nightly with defaults
   
   ```
   buffer_bit_ops and      time:   [419.91 ns 420.01 ns 420.12 ns]
   buffer_bit_ops or       time:   [550.34 ns 550.43 ns 550.54 ns]
   ```
   
   ### Nightly with simd
   
   ```
   buffer_bit_ops and      time:   [252.84 ns 253.71 ns 254.75 ns]
   buffer_bit_ops or       time:   [276.70 ns 276.77 ns 276.85 ns]
   ```
   
   ### Nightly with avx512
   
   ```
   buffer_bit_ops and      time:   [338.87 ns 339.09 ns 339.37 ns]
   buffer_bit_ops or       time:   [365.72 ns 365.78 ns 365.86 ns]
   ```
   
   ### Nightly with defaults and RUSTFLAGS="-Ctarget-cpu=native"
   
   ```
   buffer_bit_ops and      time:   [177.27 ns 177.32 ns 177.38 ns]              
                 
   buffer_bit_ops or       time:   [290.42 ns 290.47 ns 290.52 ns]     
   ```
   
   ### Nightly with simd and RUSTFLAGS="-Ctarget-cpu=native"
   
   ```
   buffer_bit_ops and      time:   [199.39 ns 199.42 ns 199.45 ns]              
                 
   buffer_bit_ops or       time:   [227.88 ns 227.93 ns 227.98 ns]
   ```
   
   ### Nightly with avx512 and RUSTFLAGS="-Ctarget-cpu=native"
   
   ```
   buffer_bit_ops and      time:   [199.58 ns 199.64 ns 199.73 ns]              
                 
   buffer_bit_ops or       time:   [229.27 ns 229.30 ns 229.34 ns]  
   ```
   
   ### Nightly with defaults and RUSTFLAGS="-Ctarget-cpu=native 
-Ctarget-feature=-prefer-256-bit"
   
   ```
   buffer_bit_ops and      time:   [166.14 ns 166.19 ns 166.26 ns]
   buffer_bit_ops or       time:   [208.24 ns 208.30 ns 208.36 ns]
   ```
   
   ### Nightly with simd and RUSTFLAGS="-Ctarget-cpu=native 
-Ctarget-feature=-prefer-256-bit"
   
   ```
   buffer_bit_ops and      time:   [197.55 ns 197.58 ns 197.60 ns]
   buffer_bit_ops or       time:   [223.72 ns 223.79 ns 223.86 ns]
   ```
   
   ### Nightly with avx512 and RUSTFLAGS="-Ctarget-cpu=native 
-Ctarget-feature=-prefer-256-bit"
   
   ```
   buffer_bit_ops and      time:   [200.34 ns 200.38 ns 200.41 ns]
   buffer_bit_ops or       time:   [328.80 ns 328.84 ns 328.89 ns]
   ```
   
   ### Stable with defaults RUSTFLAGS="-Ctarget-cpu=native"
   
   ```
   buffer_bit_ops and      time:   [178.72 ns 178.77 ns 178.82 ns]              
                 
   buffer_bit_ops or       time:   [294.65 ns 294.69 ns 294.74 ns]      
   ```
   
   ### Stable with defaults RUSTFLAGS="-Ctarget-cpu=native 
-Ctarget-feature=-prefer-256-bit"
   
   ```
   buffer_bit_ops and      time:   [176.34 ns 176.82 ns 177.45 ns]              
                 
   buffer_bit_ops or       time:   [200.99 ns 201.08 ns 201.17 ns]    
   ```
   
   # Conclusion
   
   * simd feature is always faster than avx512 feature
   * With `target-cpu=native` the LLVM generated `buffer_bit_ops` is faster 
than the simd version
   * With `target-feature=-prefer-256-bit` the LLVM generated code is better 
than either of the hand-rolled loops
   * Performance between stable and nightly is very similar
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] tustvold commented on pull request #1830: Remove simd and avx512 bitwise kernels in favor of autovectorization

Reply via email to