[ https://issues.apache.org/jira/browse/FLINK-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414110#comment-15414110 ]
Stefan Richter commented on FLINK-3623: --------------------------------------- In my opinion, our implementation of Murmur might be too complicated and computationally expensive for simply hashing an int. Most parts of the used algorithm are only required for arbitrary byte[] keys. What matters for mixing the bits of an int is the finalizer part of Murmur3, which is: h ^= h >> 16; h *= 0x85ebca6b; h ^= h >> 13; h *= 0xc2b2ae35; h ^= h >> 16; If I remember correctly, the purpose of the remaining code is to generate the intermediate hash from blocks over variable size byte[], whereas the purpose of the finalizer is to actually increase entropy and reduce collisions of the intermediate hash. Restricting the code to these lines could speed up hashing significantly, which might be even more relevant in case hash calculations are also used to determine keygroups. > Adjust MurmurHash algorithm > --------------------------- > > Key: FLINK-3623 > URL: https://issues.apache.org/jira/browse/FLINK-3623 > Project: Flink > Issue Type: Improvement > Components: Distributed Coordination > Affects Versions: 1.1.0 > Reporter: Greg Hogan > Assignee: Greg Hogan > Priority: Trivial > Fix For: 1.1.0 > > > Flink's MurmurHash implementation differs from the published algorithm. > From Flink's MathUtils.java: > {code} > code *= 0xe6546b64; > {code} > The Murmur3_32 algorithm as described by > [Wikipedia|https://en.wikipedia.org/wiki/MurmurHash]: > {code} > m ← 5 > n ← 0xe6546b64 > hash ← hash × m + n > {code} > and in Guava's Murmur3_32HashFunction.java: > {code} > h1 = h1 * 5 + 0xe6546b64; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)