tustvold commented on issue #12821: URL: https://github.com/apache/datafusion/issues/12821#issuecomment-2408966481
I found that LLVM is relatively good at vectorizing vertical operations provided: * There are no conditionals within the loop body * You've been careful to avoid inlining too much, as the vectorizer gives up if the code is too complex * You aren't doing bitwise horizontal reductions or masking (although FWIW std::simd struggles with this as well) * You've enabled SIMD instructions in the target ISA This last point is likely why you aren't seeing anything, the default x86 ISA is over a decade old at this point and doesn't support pretty much any SIMD instructions. See the Performance Tips section at the end of - https://crates.io/crates/arrow My 2 cents is to get as far as you can without reaching for std::simd, there is a massive maintainance overhead and with care LLVM can produce code that performs better than naively written manual SIMD. We used to have a fair bit of manual SIMD in arrow-rs, and over time we've removed it as the auto-vectorized code was faster. I'd recommend getting familiar with tools like https://rust.godbolt.org/ (again being sure to set RUSTFLAGS) and only once you've exhausted that avenue think of reaching for SIMD. Generally the hard part is getting the algorithm structured in such a way that it _can_ be vectorised, regardless of what goes and generates those instructions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
