[
https://issues.apache.org/jira/browse/CASSANDRA-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13073207#comment-13073207
]
Brian Lindauer commented on CASSANDRA-2975:
-------------------------------------------
Surprising, but yes. It's dramatically faster. The MurmurHash author reports a
50% speedup over v2 at http://code.google.com/p/smhasher/wiki/MurmurHash3. I
ran my own simple benchmark on the Java version comparing the existing
MurmurHash.hash64() function to the MurmurHash.hash3_x64_128() I added and
found an even larger advantage. The improvement is so huge that I wonder a
little bit if there isn't a flaw in my test, but here it is:
{code:java}
start = System.currentTimeMillis();
long[] reta = {0, 0};
ByteBuffer buf = strToByteBuffer(key);
for (int i=0; i<cnt; i++)
{
buf.clear();
reta = MurmurHash.hash3_x64_128(buf, 0, key.length(), (int) reta[0]);
}
end = System.currentTimeMillis();
System.err.println("Ran v3 " + cnt + " times in " + (end - start) + " ms.");
{code}
Similarly for v2.
Output:
{code}
Ran v2 100000000 times in 19993 ms.
Ran v3 100000000 times in 3104 ms.
{code}
FWIW, I also ran some tests where I generated random strings and seeds and
submitted them to both the reference implementation and the Java port and found
no differences.
> Upgrade MurmurHash to version 3
> -------------------------------
>
> Key: CASSANDRA-2975
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2975
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Affects Versions: 0.8.3
> Reporter: Brian Lindauer
> Priority: Trivial
>
> MurmurHash version 3 was finalized on June 3. It provides an enormous speedup
> and increased robustness over version 2, which is implemented in Cassandra.
> Information here:
> http://code.google.com/p/smhasher/
> The reference implementation is here:
> http://code.google.com/p/smhasher/source/browse/trunk/MurmurHash3.cpp?spec=svn136&r=136
> I have already done the work to port the (public domain) reference
> implementation to Java in the MurmurHash class and updated the BloomFilter
> class to use the new implementation:
> https://github.com/lindauer/cassandra/commit/cea6068a4a3e5d7d9509335394f9ef3350d37e93
> Apart from the faster hash time, the new version only requires one call to
> hash() rather than 2, since it returns 128 bits of hash instead of 64.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira