[
https://issues.apache.org/jira/browse/CASSANDRA-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16208846#comment-16208846
]
Jason Brown commented on CASSANDRA-13291:
-----------------------------------------
[~beobal] Thanks for doing the hard part of this review!
bq. if we want to go back to just using the old thread local here or not
I think the {{Hasher}} instance will live so briefly (just the scope of the
{{#hashToBigInteger()}} function) it's probably not worth mucking with the
{{ThreadLocal}} anymore.
trivial/petty nits:
- In about 2-3 places in the non-test code, you call
{{HashingUtils.CURRENT_HASH_FUNCTION#newHasher()}}. Part of me wants to hide
away the {{CURRENT_HASH_FUNCTION}} and just have a function
{{HashingUtils#newHasher()}}, which knows to use the {{CURRENT_HASH_FUNCTION}}.
wdyt? (not a blocker for +1'ing, btw)
- in {{RandomPartitioner}}, {{#hashToBigInteger()}} is called four times from
within the same class, and only from this class. Can we a) make the function
private, and b) instead of {{RandomPartitioner#hashToBigInteger()}} can we just
call {{#hashToBigInteger()}}?
- I'm 99% sure {{Validator.CountingHasher#putObject()}} is not being called,
and it's just there to fill out the interface. Can you add a quick note that
we're not expecting it to be be used, and thus not counting the bytes?
- Once again on the "petty, but let's make it correct" front,
{{Validator.CountingHasher#putUnencodedChars()}} and
{{Validator.CountingHasher#putString()}} both call {{CharSequence#length()}} to
get the value to add to the {{count}}. According to the javadoc for
{{CharSequence#length()}} reads:
{code}
Returns the length of this character sequence. The length is the number of
16-bit chars in the sequence.
{code}
At a minimum we should multiply the output of {{CharSequence#length()}} by 2 to
get the number of 8-bit bytes to correctly increment {{count}}; we should also
add a small comment, as well.
- {{GuidGenerator}} - remove import of {{RandomParitioner}}
I think we're almost there!
> Replace usages of MessageDigest with Guava's Hasher
> ---------------------------------------------------
>
> Key: CASSANDRA-13291
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13291
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Michael Kjellman
> Assignee: Michael Kjellman
> Attachments: CASSANDRA-13291-trunk.diff
>
>
> During my profiling of C* I frequently see lots of aggregate time across
> threads being spent inside the MD5 MessageDigest implementation. Given that
> there are tons of modern alternative hashing functions better than MD5
> available -- both in terms of providing better collision resistance and
> actual computational speed -- I wanted to switch out our usage of MD5 for
> alternatives (like adler128 or murmur3_128) and test for performance
> improvements.
> Unfortunately, I found given the fact we use MessageDigest everywhere --
> switching out the hashing function to something like adler128 or murmur3_128
> (for example) -- which don't ship with the JDK -- wasn't straight forward.
> The goal of this ticket is to propose switching out usages of MessageDigest
> directly in favor of Hasher from Guava. This means going forward we can
> change a single line of code to switch the hashing algorithm being used
> (assuming there is an implementation in Guava).
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]