[
https://issues.apache.org/jira/browse/CASSANDRA-11933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cyril Scetbon updated CASSANDRA-11933:
--------------------------------------
Description:
During a full repair on a ~ 60 nodes cluster, I've been able to see that this
stage can be significant (up to 60 percent of the whole time) :
https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/StorageService.java#L2983-L2997
It's merely caused by the fact that
https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/ActiveRepairService.java#L189
calls {code}ss.getLocalRanges(keyspaceName){code} everytime and that it takes
more than 99% of the time. This call takes 600ms when there is no load on the
cluster and more if there is. So for 10k ranges, you can imagine that it takes
at least 1.5 hours just to compute ranges.
Underneath it calls
[ReplicationStrategy.getAddressRanges|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L170]
which can get pretty inefficient.
*ss.getLocalRanges(keyspaceName)* should be cached to avoid having to spend
hours on it.
was:
During a full repair on a ~ 60 nodes cluster, I've been able to see that this
stage can be significant (up to 60 percent of) :
https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/StorageService.java#L2983-L2997
It's merely caused by the fact that
https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/ActiveRepairService.java#L189
calls {code}ss.getLocalRanges(keyspaceName){code} everytime and that it takes
more than 99% of the time. This call takes 600ms when there is no load on the
cluster and more if there is. So for 10k ranges, you can imagine that it takes
at least 1.5 hours just to compute ranges.
Underneath it calls
[ReplicationStrategy.getAddressRanges|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L170]
which can get pretty inefficient.
*ss.getLocalRanges(keyspaceName)* should be cached to avoid having to spend
hours on it.
> Improve Repair performance
> --------------------------
>
> Key: CASSANDRA-11933
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11933
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Cyril Scetbon
>
> During a full repair on a ~ 60 nodes cluster, I've been able to see that
> this stage can be significant (up to 60 percent of the whole time) :
> https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/StorageService.java#L2983-L2997
> It's merely caused by the fact that
> https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/ActiveRepairService.java#L189
> calls {code}ss.getLocalRanges(keyspaceName){code} everytime and that it
> takes more than 99% of the time. This call takes 600ms when there is no load
> on the cluster and more if there is. So for 10k ranges, you can imagine that
> it takes at least 1.5 hours just to compute ranges.
> Underneath it calls
> [ReplicationStrategy.getAddressRanges|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L170]
> which can get pretty inefficient.
> *ss.getLocalRanges(keyspaceName)* should be cached to avoid having to spend
> hours on it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)