tustvold commented on issue #12821:
URL: https://github.com/apache/datafusion/issues/12821#issuecomment-2408966481

   I found that LLVM is relatively good at vectorizing vertical operations 
provided:
   
   * There are no conditionals within the loop body
   * You've been careful to avoid inlining too much, as the vectorizer gives up 
if the code is too complex
   * You aren't doing bitwise horizontal reductions or masking (although FWIW 
std::simd struggles with this as well)
   * You've enabled SIMD instructions in the target ISA
   
   This last point is likely why you aren't seeing anything, the default x86 
ISA is over a decade old at this point and doesn't support pretty much any SIMD 
instructions. See the Performance Tips section at the end of - 
https://crates.io/crates/arrow
   
   My 2 cents is to get as far as you can without reaching for std::simd, there 
is a massive maintainance overhead and with care LLVM can produce code that 
performs better than naively written manual SIMD. We used to have a fair bit of 
manual SIMD in arrow-rs, and over time we've removed it as the auto-vectorized 
code was faster.
   
   I'd recommend getting familiar with tools like https://rust.godbolt.org/ 
(again being sure to set RUSTFLAGS) and only once you've exhausted that avenue 
think of reaching for SIMD. Generally the hard part is getting the algorithm 
structured in such a way that it _can_ be vectorised, regardless of what goes 
and generates those instructions.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to