Cyril Scetbon created CASSANDRA-11933:
-----------------------------------------

             Summary: Improve Repair performance
                 Key: CASSANDRA-11933
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11933
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
            Reporter: Cyril Scetbon


During  a full repair on a ~ 60 nodes cluster, I've been able to see that this 
stage can be significant (up to 60 percent of) :

https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/StorageService.java#L2983-L2997

It's merely caused by the fact that 
https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/ActiveRepairService.java#L189
 calls {code}ss.getLocalRanges(keyspaceName){code} everytime and that it takes 
more than 99% of the time. This call takes 600ms when there is no load on the 
cluster and more if there is. So for 10k ranges, you can imagine that it takes 
at least 1.5 hours just to compute ranges. 

Underneath it calls 
[ReplicationStrategy.getAddressRanges|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L170]
 which can get pretty inefficient.

*ss.getLocalRanges(keyspaceName)* should be cached to avoid having to spend 
hours on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to