[
https://issues.apache.org/jira/browse/CASSANDRA-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150931#comment-13150931
]
David Allsopp commented on CASSANDRA-2975:
------------------------------------------
Does anyone have any data on what the typical key size is (i.e. the average
input size for the hash)?
I have a couple of optimisations for the MurmurHash3 implementation that I
think give another 10-40% speedup, particularly for smaller values (e.g. 30%
speedup for buffer lengths under 256 bytes) and no worse for large values (tens
of KB). These results were on a AMD Phenom II X6 1055T @ 2.80 GHz, under 64-bit
Windows 7, Java 1.6.0_27.
Firstly, inline the {rotl64}} calls , e.g. so that
{noformat}
k1 = rotl64(k1, 31);
{noformat}
becomes
{noformat}
k1 = (k1 << 31) | (k1 >>> 33);
{noformat}
Secondly, rather than a large {{switch-case}} to handle the 'tail', use nested
{{if-else}} to form a simple binary search. Particularly for relatively small
inputs, handling the 'tail' is a significant part of the computation. E.g:
{noformat}
int ln = length & 15;
if (ln > 8)
{
if (ln > 12)
{
// etc for cases 13 - 15
}
else
{
// cases 11 and 12
}
}
else
{
// etc for cases 1-7
}
{noformat}
Will try to post a proper benchmark when I've tidied it up (run out of time
today!) so anyone interested can try it on other hardware...
This latter optimisation is pretty verbose and ugly to look at - it _might_ be
just as fast, and much more concise, to lookup the offsets and shifts from an
array, and deal with the special cases 1 and 9 as, well, special cases - but
haven't benchmarked this alternative yet...
> Upgrade MurmurHash to version 3
> -------------------------------
>
> Key: CASSANDRA-2975
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2975
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Brian Lindauer
> Assignee: Brian Lindauer
> Priority: Trivial
> Labels: lhf
> Fix For: 1.1
>
> Attachments:
> 0001-Convert-BloomFilter-to-use-MurmurHash-v3-instead-of-.patch,
> 0002-Backwards-compatibility-with-files-using-Murmur2-blo.patch
>
>
> MurmurHash version 3 was finalized on June 3. It provides an enormous speedup
> and increased robustness over version 2, which is implemented in Cassandra.
> Information here:
> http://code.google.com/p/smhasher/
> The reference implementation is here:
> http://code.google.com/p/smhasher/source/browse/trunk/MurmurHash3.cpp?spec=svn136&r=136
> I have already done the work to port the (public domain) reference
> implementation to Java in the MurmurHash class and updated the BloomFilter
> class to use the new implementation:
> https://github.com/lindauer/cassandra/commit/cea6068a4a3e5d7d9509335394f9ef3350d37e93
> Apart from the faster hash time, the new version only requires one call to
> hash() rather than 2, since it returns 128 bits of hash instead of 64.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira