Dandandan edited a comment on pull request #10002:
URL: https://github.com/apache/arrow/pull/10002#issuecomment-819747206


   > I think this is a great idea @Dandandan
   > 
   > I ran the benchmark tests on my laptop and I saw similar improvement 
numbers (master is 30% slower than this branch).
   > 
   > I vote we :shipit:
   > 
   > Given how effective this is, perhaps we can add a similar thing to other 
kernels (I could file a ticket if so)
   > 
   > FYI @andygrove and @nevi-me
   
   I think it can be useful to check which kernels / functions can benefit from 
some more simd instructions. We might also need to play a bit with things like 
`inline` attributes, as non-inlined code might not benefit from the same 
optimizations, as the function in that case is reused for both code paths. Also 
auto vectorization doesn't always work and as we see here, it is not nearly as 
effective as thr "manual" implementation (still ~4x faster!).
   
   As we are using packed_simd2, which also uses only the standard instructions 
set in `target-features` (i.e. sse2), I believe the same idea could be used 
there without resorting to emitting instructions unconditionally (like we do 
for the `avx_512` feature at the moment).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to