[GitHub] [arrow] drin commented on pull request #13487: ARROW-16945: [C++] Add new scalar compute function for 32-bit hashing

GitBox Wed, 20 Jul 2022 11:18:18 -0700


drin commented on PR #13487:
URL: https://github.com/apache/arrow/pull/13487#issuecomment-1190606383


   Also, I see now that `util/hashing.h` wraps xxHash, vendored from Cyan4973's 
github repo. Hashing32 has the following comment above it:
   
       // Implementations are based on xxh3 32-bit algorithm description from:
       // https://github.com/Cyan4973/xxHash/blob/dev/doc/xxhash_spec.md
   
   I need to experiment a bit, but I suspect that for int primitives, Hashing32 
and Hashing64 just wrap xxHash and add logic for iterating over rows and 
columns. Specifically, 
[key_hash.cc#L745](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/exec/key_hash.cc#L745)
 looks a lot like 
[hashing.h#L88](https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/hashing.h#L88).
 
[hashing_benchmark.cc#L70](https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/hashing_benchmark.cc#L70)
 benchmarks using 2 multipliers, and I think if it uses just the 1 multiplier 
then the implementation looks a lot like Hashing64.
   
   All this to say that I think it should be accommodated in the benchmark, but 
I also wonder if some of the hashing code can be simplified/better organized in 
case they aren't as independent as they seem at first glance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] drin commented on pull request #13487: ARROW-16945: [C++] Add new scalar compute function for 32-bit hashing

Reply via email to