westonpace commented on PR #13487: URL: https://github.com/apache/arrow/pull/13487#issuecomment-1190658559
Here is my understanding, which may not be complete: `util/hashing.h` is a more-or-less direct adapter between Arrow and xxhash. `key_hash.cc` is based on xxhash but most of the algorithms have been reimplemented to some degree. The `key_hash.cc` utilities are, in theory, better suited to take advantage of columnar formats and/or vectorized CPUs. However, as part of this rewrite, certain sacrifices were made in hash quality, in favor of performance. This is primarily (only?) used today in Acero for the hash-join and hash-aggregate. In theory, `util/hashing.cc` should have a better distribution but worse performance than `key_hash.cc`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
