alamb commented on issue #7095:
URL: https://github.com/apache/datafusion/issues/7095#issuecomment-2541343823

   > Do you mean `hash_utils::create_hashes` is vectorized (SIMDed)? Actually I 
haven't successfully find such SIMD instruction in group aggregation yet. Maybe 
my compiler configuration is incorrect
   
   What I really means is that 
https://docs.rs/datafusion/latest/datafusion/common/hash_utils/fn.create_hashes.html
 creates the hashes a column at a time(Vector) rater than on each row
   
   At the very least this is likely faster than calling it on each row as there 
is one function call per batch rather than per row. 
   
   I think it also gives the compiler a better chance to actually use SIMD CPU 
instructions to compute the hashes. This is what I believe is referred to as 
"auto vectorization". 
   
   However, I have not verified that the rust compiler actually does use SIMD 
instructions for `create_hashes`. Maybe that is something worth looking into 
   
   This is what @XiangpengHao did when optimizing StringView. Looked carefully 
at the assembly produced by https://godbolt.org/z/1jhc1hae1 and verified / 
tweaked the code until it was using expected instructions
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to