[
https://issues.apache.org/jira/browse/CASSANDRA-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16207505#comment-16207505
]
Sam Tunnicliffe commented on CASSANDRA-13291:
---------------------------------------------
{{RandomPartitioner::hashToBigInteger}} is double hashing its input (I think
this is what Jason was referring to in his previous comment), and so its output
doesn't match the previous implementation.
When using RP, getting a token for a key is probably the hottest path for
hashing. The current code uses a {{ThreadLocal<MessageDigest>}} which it resets
after use, presumably to mitigate that. Under the covers
{{Hasher.md5().hashBytes()}} clones a prototype {{MessageDigest}}, so this is
going to result in a lot more instance creation. (See
{{AbstractStreamingHashFunction::hashBytes ->
MessageDigestHashFunction::newHasher}}).
I'm not sure of the original motivations for the threadlocal, or whether those
are still justified, but it seems like we should investigate outside of
microbenchmarks before committing this.
> Replace usages of MessageDigest with Guava's Hasher
> ---------------------------------------------------
>
> Key: CASSANDRA-13291
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13291
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Michael Kjellman
> Assignee: Michael Kjellman
> Attachments: CASSANDRA-13291-trunk.diff
>
>
> During my profiling of C* I frequently see lots of aggregate time across
> threads being spent inside the MD5 MessageDigest implementation. Given that
> there are tons of modern alternative hashing functions better than MD5
> available -- both in terms of providing better collision resistance and
> actual computational speed -- I wanted to switch out our usage of MD5 for
> alternatives (like adler128 or murmur3_128) and test for performance
> improvements.
> Unfortunately, I found given the fact we use MessageDigest everywhere --
> switching out the hashing function to something like adler128 or murmur3_128
> (for example) -- which don't ship with the JDK -- wasn't straight forward.
> The goal of this ticket is to propose switching out usages of MessageDigest
> directly in favor of Hasher from Guava. This means going forward we can
> change a single line of code to switch the hashing algorithm being used
> (assuming there is an implementation in Guava).
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]