[ 
https://issues.apache.org/jira/browse/FLINK-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414110#comment-15414110
 ] 

Stefan Richter commented on FLINK-3623:
---------------------------------------

In my opinion, our implementation of Murmur might be too complicated and 
computationally expensive for simply hashing an int.  Most parts of the used 
algorithm are only required for arbitrary byte[] keys. What matters for mixing 
the bits of an int is the finalizer part of Murmur3, which is:
  
  h ^= h >> 16;
  h *= 0x85ebca6b;
  h ^= h >> 13;
  h *= 0xc2b2ae35;
  h ^= h >> 16;

If I remember correctly, the purpose of the remaining code is to generate the 
intermediate hash from blocks over variable size byte[], whereas the purpose of 
the finalizer is to actually increase entropy and reduce collisions of the 
intermediate hash. Restricting the code to these lines could speed up hashing 
significantly, which might be even more relevant in case hash calculations are 
also used to determine keygroups.

> Adjust MurmurHash algorithm
> ---------------------------
>
>                 Key: FLINK-3623
>                 URL: https://issues.apache.org/jira/browse/FLINK-3623
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Coordination
>    Affects Versions: 1.1.0
>            Reporter: Greg Hogan
>            Assignee: Greg Hogan
>            Priority: Trivial
>             Fix For: 1.1.0
>
>
> Flink's MurmurHash implementation differs from the published algorithm.
> From Flink's MathUtils.java:
> {code}
> code *= 0xe6546b64;
> {code}
> The Murmur3_32 algorithm as described by 
> [Wikipedia|https://en.wikipedia.org/wiki/MurmurHash]:
> {code}
> m ← 5
> n ← 0xe6546b64
> hash ← hash × m + n
> {code}
> and in Guava's Murmur3_32HashFunction.java:
> {code}
> h1 = h1 * 5 + 0xe6546b64;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to