[GitHub] Ben-Zvi commented on issue #1662: DRILL-6825: apply different hash algorithms to different data types

GitBox Fri, 01 Mar 2019 23:12:24 -0800

Ben-Zvi commented on issue #1662: DRILL-6825: apply different hash algorithms 
to different data types
URL: https://github.com/apache/drill/pull/1662#issuecomment-468894618
 
 
   Adding the hash32() method to the ValueVector is useful; however picking up 
algorithms just based on a paper or being famous may not be good enough.  At my 
previous employer I evaluated many hash functions by actually running (stand 
alone) performance and distribution tests. One clear result shown back then is 
that murmur performed well on long strings, less good on shorter data.
   
      How do the new hash functions in IntegerHashing compare with the existing 
one in HashHelper ?
   
        The Boost implementation of hash_combine looks "fishy" (e.g., some bits 
get more used than others) -- see some more critique at  
https://stackoverflow.com/questions/35985960/c-why-is-boosthash-combine-the-best-way-to-combine-hash-values
  
   
       Why can't the seed be given directly to the hash function instead of 
being "combined" later ?
   
      Another good hash function used in the past (don't recall any name) 
worked with a map of 256 prime numbers, and the code (starting with the seed) 
was using each  input byte as an index to the map - rotate old value, XOR with 
new mapped value, continue ....
   
      Now things may perform differently in Java.  
      Also - do you know of any open source hash functions we can just import 
instead of writing the code in Drill ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] Ben-Zvi commented on issue #1662: DRILL-6825: apply different hash algorithms to different data types

Reply via email to