dmatth1 commented on PR #10011:
URL: https://github.com/apache/arrow-rs/pull/10011#issuecomment-4526481867

   Tested locally on aarch64 too (Apple Silicon M1, baseline NEON autovec):
   
     | Regime    | Path   | Scalar | Autovec | Speedup |
     |-----------|--------|-------:|--------:|--------:|
     | S 128 KiB | miss   |   4.61 |    3.24 | 1.42x   |
     | S 128 KiB | hit    |   6.84 |    3.17 | 2.16x   |
     | S 128 KiB | insert |   3.25 |    3.19 | 1.02x   |
     | M 2 MiB   | miss   |   5.20 |    3.24 | 1.61x   |
     | M 2 MiB   | hit    |   7.16 |    3.26 | 2.20x   |
     | M 2 MiB   | insert |   3.34 |    3.31 | 1.01x   |
     | L 32 MiB  | miss   |   6.66 |    5.42 | 1.23x   |
     | L 32 MiB  | hit    |   9.72 |    5.25 | 1.85x   |
     | L 32 MiB  | insert |   5.19 |    5.38 | 0.96x   |
   
   Big simplifier. I included details about how autovec reduces/lowers 
instructions in the new commit message. Going to force-push to use this 
approach.
   
   One thing beyond your suggestion: I prototyped a runtime AVX2-detect shim 
and dropped it for the simplification (no `unsafe`, no `Sbbf` field, no 
hot-path branch) since users who care about AVX2 probably already set `-C 
target-cpu=...`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to