[
https://issues.apache.org/jira/browse/CASSANDRA-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16183400#comment-16183400
]
Jason Brown commented on CASSANDRA-13291:
-----------------------------------------
I knocked out a quick [JMH
bench|https://github.com/jasobrown/cassandra/commit/cd35b1a771a74c2bf1d3bf2c0916967e74821385]
to see what the difference between {{MessageDigest}} and {{Hasher}} would be.
I selected guava's MD5 and murmur3_128 hashers for comparison. Here's what I
found:
{noformat}
[java] Benchmark (bufferSize) Mode Cnt
Score Error Units
[java] HashingBench.benchHasherMD5 31 avgt 5
336.613 ± 18.826 ns/op
[java] HashingBench.benchHasherMD5 131 avgt 5
709.226 ± 19.489 ns/op
[java] HashingBench.benchHasherMD5 517 avgt 5
1800.091 ± 37.748 ns/op
[java] HashingBench.benchHasherMD5 2041 avgt 5
6275.607 ± 623.008 ns/op
[java] HashingBench.benchHasherMurmur3_128 31 avgt 5
260.859 ± 39.229 ns/op
[java] HashingBench.benchHasherMurmur3_128 131 avgt 5
421.268 ± 68.287 ns/op
[java] HashingBench.benchHasherMurmur3_128 517 avgt 5
861.577 ± 68.423 ns/op
[java] HashingBench.benchHasherMurmur3_128 2041 avgt 5
2863.952 ± 314.205 ns/op
[java] HashingBench.benchMessageDigestMD5 31 avgt 5
787.373 ± 69.869 ns/op
[java] HashingBench.benchMessageDigestMD5 131 avgt 5
1264.677 ± 117.790 ns/op
[java] HashingBench.benchMessageDigestMD5 517 avgt 5
2822.846 ± 178.416 ns/op
[java] HashingBench.benchMessageDigestMD5 2041 avgt 5
9611.875 ± 1760.809 ns/op
{noformat}
Naively, I used byte arrays for four varying sizes, updated the hasher/digest,
and got the hashed result. I selected buffer sizes that are close to
powers-of-2, but intentionally not. It looks like the guava {{Hasher}}s do
indeed perform better than {{MessageDigest}}.
> Replace usages of MessageDigest with Guava's Hasher
> ---------------------------------------------------
>
> Key: CASSANDRA-13291
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13291
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Michael Kjellman
> Assignee: Michael Kjellman
> Attachments: CASSANDRA-13291-trunk.diff
>
>
> During my profiling of C* I frequently see lots of aggregate time across
> threads being spent inside the MD5 MessageDigest implementation. Given that
> there are tons of modern alternative hashing functions better than MD5
> available -- both in terms of providing better collision resistance and
> actual computational speed -- I wanted to switch out our usage of MD5 for
> alternatives (like adler128 or murmur3_128) and test for performance
> improvements.
> Unfortunately, I found given the fact we use MessageDigest everywhere --
> switching out the hashing function to something like adler128 or murmur3_128
> (for example) -- which don't ship with the JDK -- wasn't straight forward.
> The goal of this ticket is to propose switching out usages of MessageDigest
> directly in favor of Hasher from Guava. This means going forward we can
> change a single line of code to switch the hashing algorithm being used
> (assuming there is an implementation in Guava).
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]