[ 
https://issues.apache.org/jira/browse/CASSANDRA-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16216001#comment-16216001
 ] 

Michael Kjellman commented on CASSANDRA-13291:
----------------------------------------------

[~beobal] The profiling I did was with MurmurPartitioner so RP was completely 
out of the picture.

I do see the usage of {{MD5Digest}} in CASSANDRA-10786, but that was just added 
so I think that's why I missed it... given that it's a static final reference 
to an object that includes an MD5, I'm not exactly sure how to make this play 
nicely with a rolling upgrade in the future "change to something not MD5 digest 
ticket".

The goal of this ticket (and the follow up one once we get this fun refactoring 
out of the way) was do deal with server-to-server communication... not anything 
that went thru the native transport and was client-to-server. Looking at how 
CASSANDRA-10786 was implemented, we would need to bump the native transport 
protocol version as MD5 is assumed and it's part of the protocol at this point, 
so I think we could deal with this in a 3rd ticket to deal with straggler MD5 
usages like that.

Another usage of the TL is {{MD5Digest#compute(String toHash)}}, which is only 
used by {{QueryProcessorr#computeId(String queryString, String keyspace)}}... I 
think again this could be taken care of in a separate client native transport 
MD5 ticket.. so tl;dr: the explicit goal of this ticket (and the smaller follow 
up one to switch to murmur or something else once we've profiled it) was to 
deal with all usages of MD5 in regards to server-to-server communication -- and 
not anything todo with client MD5 usage at this time; although clearly we would 
want to address that usage in the future too.

Re: {{GuidGenerator}} I'm personally preferential to leaving that as a utility 
class vs. pulling it all into RandomPartitioner... I know that it's only used 
by {{RandomPartitioner}} but it's generic enough and there is no real cost and 
I think personally it's cleaner to have it as a utility class...

I pushed up a small change to revert the change of switching 
{{ResultSet#computeResultMetadataId}} (just very recently added as part of 
CASSANDRA-10786) to Hasher and put a comment in.

> Replace usages of MessageDigest with Guava's Hasher
> ---------------------------------------------------
>
>                 Key: CASSANDRA-13291
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13291
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Michael Kjellman
>            Assignee: Michael Kjellman
>         Attachments: CASSANDRA-13291-trunk.diff
>
>
> During my profiling of C* I frequently see lots of aggregate time across 
> threads being spent inside the MD5 MessageDigest implementation. Given that 
> there are tons of modern alternative hashing functions better than MD5 
> available -- both in terms of providing better collision resistance and 
> actual computational speed -- I wanted to switch out our usage of MD5 for 
> alternatives (like adler128 or murmur3_128) and test for performance 
> improvements.
> Unfortunately, I found given the fact we use MessageDigest everywhere --  
> switching out the hashing function to something like adler128 or murmur3_128 
> (for example) -- which don't ship with the JDK --  wasn't straight forward.
> The goal of this ticket is to propose switching out usages of MessageDigest 
> directly in favor of Hasher from Guava. This means going forward we can 
> change a single line of code to switch the hashing algorithm being used 
> (assuming there is an implementation in Guava).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to