[
https://issues.apache.org/jira/browse/CASSANDRA-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151092#comment-13151092
]
David Allsopp commented on CASSANDRA-2975:
------------------------------------------
Having slept on it, I realized that the {{switch-case}} is slow because of the
{{switch-case}} fallthrough in Java - _i.e. all 15 cases get tested every time_
(http://download.oracle.com/javase/tutorial/java/nutsandbolts/switch.html)
Adding a {{break}} to the end of each case fixes this without the verbosity of
the binary search approach, i.e:
{noformat}
switch (length & 15)
{
case 0: break;
case 1:
k1 ^= ((long) key.get(offset));
k1 *= c1;
k1 = rotl64(k1, 31);
k1 *= c2;
h1 ^= k1;
break;
case 2:
k1 ^= ((long) key.get(offset + 1)) << 8;
break;
...etc...
{noformat}
Combining this with the inlining almost doubles the speed of hashing for keys
with lengths from 1-32 bytes!
> Upgrade MurmurHash to version 3
> -------------------------------
>
> Key: CASSANDRA-2975
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2975
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Brian Lindauer
> Assignee: Brian Lindauer
> Priority: Trivial
> Labels: lhf
> Fix For: 1.1
>
> Attachments:
> 0001-Convert-BloomFilter-to-use-MurmurHash-v3-instead-of-.patch,
> 0002-Backwards-compatibility-with-files-using-Murmur2-blo.patch
>
>
> MurmurHash version 3 was finalized on June 3. It provides an enormous speedup
> and increased robustness over version 2, which is implemented in Cassandra.
> Information here:
> http://code.google.com/p/smhasher/
> The reference implementation is here:
> http://code.google.com/p/smhasher/source/browse/trunk/MurmurHash3.cpp?spec=svn136&r=136
> I have already done the work to port the (public domain) reference
> implementation to Java in the MurmurHash class and updated the BloomFilter
> class to use the new implementation:
> https://github.com/lindauer/cassandra/commit/cea6068a4a3e5d7d9509335394f9ef3350d37e93
> Apart from the faster hash time, the new version only requires one call to
> hash() rather than 2, since it returns 128 bits of hash instead of 64.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira