westonpace commented on PR #13487:
URL: https://github.com/apache/arrow/pull/13487#issuecomment-1190658559

   Here is my understanding, which may not be complete:
   
   `util/hashing.h` is a more-or-less direct adapter between Arrow and xxhash.
   
   `key_hash.cc` is based on xxhash but most of the algorithms have been 
reimplemented to some degree. The `key_hash.cc` utilities are, in theory, 
better suited to take advantage of columnar formats and/or vectorized CPUs.  
However, as part of this rewrite, certain sacrifices were made in hash quality, 
in favor of performance.  This is primarily (only?) used today in Acero for the 
hash-join and hash-aggregate.
   
   In theory, `util/hashing.cc` should have a better distribution but worse 
performance than `key_hash.cc`.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to