Cyril Scetbon created CASSANDRA-11933:
-----------------------------------------
Summary: Improve Repair performance
Key: CASSANDRA-11933
URL: https://issues.apache.org/jira/browse/CASSANDRA-11933
Project: Cassandra
Issue Type: Improvement
Components: Core
Reporter: Cyril Scetbon
During a full repair on a ~ 60 nodes cluster, I've been able to see that this
stage can be significant (up to 60 percent of) :
https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/StorageService.java#L2983-L2997
It's merely caused by the fact that
https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/ActiveRepairService.java#L189
calls {code}ss.getLocalRanges(keyspaceName){code} everytime and that it takes
more than 99% of the time. This call takes 600ms when there is no load on the
cluster and more if there is. So for 10k ranges, you can imagine that it takes
at least 1.5 hours just to compute ranges.
Underneath it calls
[ReplicationStrategy.getAddressRanges|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L170]
which can get pretty inefficient.
*ss.getLocalRanges(keyspaceName)* should be cached to avoid having to spend
hours on it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)