weijietong commented on issue #1662: DRILL-6825: apply different hash algorithms to different data types URL: https://github.com/apache/drill/pull/1662#issuecomment-469236472 The IntegerHashing's method was also used in ClickHouse for integer types(see: https://github.com/yandex/ClickHouse/blob/master/dbms/src/Common/HashTable/Hash.h intHash32 method). CK does a fine hashing method choosing according to the data types and keys width which is valuable for us to learn. As you mentioned Murmur3Hash does not have a good performance at the shorter integer case.So it's better to use the IntegerHash at the integer keys case. The Boost implementation's discussion you mentioned I had read before. But I think it's reasonable why Boost still keep the current implementation now as a base library. The reason to keep seed away from the hash32 function and involve the Boost's hash_combine method is that I want to change the current hashing strategy later. I plan to change the hash32(hash32(hash32)) row iterate model to `hash32() hash_combine hash32() hash_combine hash32()` column combine model at the multi-keys case. The row iterate module has a data dependency and will hurt the cpu pipeline performance. Other hashing methods I know can be found here: https://github.com/benalexau/hash-bench. It's a java hashing method collection. The benchmark I run showed that https://github.com/OpenHFT/Zero-Allocation-Hashing/blob/master/src/main/java/net/openhft/hashing/LongHashFunction.java 's city_1_1 has a best performance at 32,64 bytes key width. I also wonder whether we can do the join keys data type implication at the project node later. So the HashJoin and Exchange node can also benefit from this PR.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
