alamb commented on issue #7095: URL: https://github.com/apache/datafusion/issues/7095#issuecomment-2541343823
> Do you mean `hash_utils::create_hashes` is vectorized (SIMDed)? Actually I haven't successfully find such SIMD instruction in group aggregation yet. Maybe my compiler configuration is incorrect What I really means is that https://docs.rs/datafusion/latest/datafusion/common/hash_utils/fn.create_hashes.html creates the hashes a column at a time(Vector) rater than on each row At the very least this is likely faster than calling it on each row as there is one function call per batch rather than per row. I think it also gives the compiler a better chance to actually use SIMD CPU instructions to compute the hashes. This is what I believe is referred to as "auto vectorization". However, I have not verified that the rust compiler actually does use SIMD instructions for `create_hashes`. Maybe that is something worth looking into This is what @XiangpengHao did when optimizing StringView. Looked carefully at the assembly produced by https://godbolt.org/z/1jhc1hae1 and verified / tweaked the code until it was using expected instructions -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org