[ 
https://issues.apache.org/jira/browse/CASSANDRA-13292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740157#comment-16740157
 ] 

JinhuaLuo commented on CASSANDRA-13292:
---------------------------------------

I have a question: adler or murmur3 is not _cryptographic_ hash, so there may 
be collision hash for different inputs. That is, given two different query 
result, it may give the same digest value. But digest request is used to check 
if all replica contains the same data for the specific query, so if the hash 
does not reflect the actual difference, it would give wrong result and do not 
trigger read repair.

But I also think the digest is heavyweight, which brings in unnecessary 
overhead, especially when it calculates the digest upon the unchanged large 
data.

I'm thinking that whether it could bring in a digest cache, then if the schema 
or query columns (or fields in complex columns) was not mutated, then it could 
fulfill the digest request directly from the cache.

> Replace MessagingService usage of MD5 with something more modern
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-13292
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13292
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Legacy/Core
>            Reporter: Michael Kjellman
>            Assignee: Michael Kjellman
>            Priority: Major
>
> While profiling C* via multiple profilers, I've consistently seen a 
> significant amount of time being spent calculating MD5 digests.
> {code}
> Stack Trace   Sample Count    Percentage(%)
> sun.security.provider.MD5.implCompress(byte[], int)   264     1.566
>    sun.security.provider.DigestBase.implCompressMultiBlock(byte[], int, int)  
> 200     1.187
>       sun.security.provider.DigestBase.engineUpdate(byte[], int, int) 200     
> 1.187
>          java.security.MessageDigestSpi.engineUpdate(ByteBuffer)      200     
> 1.187
>             java.security.MessageDigest$Delegate.engineUpdate(ByteBuffer)     
> 200     1.187
>                java.security.MessageDigest.update(ByteBuffer) 200     1.187
>                   org.apache.cassandra.db.Column.updateDigest(MessageDigest)  
> 193     1.145
>                      
> org.apache.cassandra.db.ColumnFamily.updateDigest(MessageDigest) 193     1.145
>                         
> org.apache.cassandra.db.ColumnFamily.digest(ColumnFamily)     193     1.145
>                            
> org.apache.cassandra.service.RowDigestResolver.resolve()   106     0.629
>                               
> org.apache.cassandra.service.RowDigestResolver.resolve()        106     0.629
>                                  
> org.apache.cassandra.service.ReadCallback.get()      88      0.522
>                                     
> org.apache.cassandra.service.AbstractReadExecutor.get()   88      0.522
>                                        
> org.apache.cassandra.service.StorageProxy.fetchRows(List, ConsistencyLevel)   
>  88      0.522
>                                           
> org.apache.cassandra.service.StorageProxy.read(List, ConsistencyLevel)      
> 88      0.522
>                                              
> org.apache.cassandra.service.pager.SliceQueryPager.queryNextPage(int, 
> ConsistencyLevel, boolean) 88      0.522
>                                                 
> org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(int)  88      
> 0.522
>                                                    
> org.apache.cassandra.service.pager.SliceQueryPager.fetchPage(int)  88      
> 0.522
>                                                       
> org.apache.cassandra.cql3.statements.SelectStatement.execute(QueryState, 
> QueryOptions)  88      0.522
>                                                          
> org.apache.cassandra.cql3.statements.SelectStatement.execute(QueryState, 
> QueryOptions)       88      0.522
>                                                             
> org.apache.cassandra.cql3.QueryProcessor.processStatement(CQLStatement, 
> QueryState, QueryOptions) 88      0.522
>                                                                
> org.apache.cassandra.cql3.QueryProcessor.process(String, QueryState, 
> QueryOptions)     88      0.522
>                                                                   
> org.apache.cassandra.transport.messages.QueryMessage.execute(QueryState)    
> 88      0.522
>                                                                      
> org.apache.cassandra.transport.Message$Dispatcher.messageReceived(ChannelHandlerContext,
>  MessageEvent)   88      0.522
>                                                                         
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(ChannelHandlerContext,
>  ChannelEvent)      88      0.522
>                                                                            
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline$DefaultChannelHandlerContext,
>  ChannelEvent)     88      0.522
>                                                                               
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(ChannelEvent)
>   88      0.522
>                                                                               
>    org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun()     
>   88      0.522
>                                                                               
>       org.jboss.netty.handler.execution.ChannelEventRunnable.run()      88    
>   0.522
>                                                                               
>          
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)  
>  88      0.522
>                                                                               
>             java.util.concurrent.ThreadPoolExecutor$Worker.run()        88    
>   0.522
>                                                                               
>                java.lang.Thread.run()   88      0.522
> {code}
> Pending CASSANDRA-13291, it would be pretty easy to:
> # Switch out the hashing implementation from MD5 to implementations such as 
> adler128 and murmur3_128 (but certainly not limited to) and do some profiling 
> to compare the net improvement on latencies and CPU usage
> # As we can't switch the algorithm from MD5 without breaking things, we could 
> rev the MessagingService protocol version -- like we already do for things 
> like switching from Snappy compression -> LZ4, we could switch to the new 
> hashing implementation once all peers in the node are upgraded and support 
> the new MessagingService version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to