[ 
https://issues.apache.org/jira/browse/CASSANDRA-11933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cyril Scetbon updated CASSANDRA-11933:
--------------------------------------
    Description: 
During  a full repair on a ~ 60 nodes cluster, I've been able to see that this 
stage can be significant (up to 60 percent of the whole time) :

https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/StorageService.java#L2983-L2997

It's merely caused by the fact that 
https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/ActiveRepairService.java#L189
 calls {code}ss.getLocalRanges(keyspaceName){code} everytime and that it takes 
more than 99% of the time. This call takes 600ms when there is no load on the 
cluster and more if there is. So for 10k ranges, you can imagine that it takes 
at least 1.5 hours just to compute ranges. 

Underneath it calls 
[ReplicationStrategy.getAddressRanges|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L170]
 which can get pretty inefficient.

*ss.getLocalRanges(keyspaceName)* should be cached to avoid having to spend 
hours on it.

  was:
During  a full repair on a ~ 60 nodes cluster, I've been able to see that this 
stage can be significant (up to 60 percent of) :

https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/StorageService.java#L2983-L2997

It's merely caused by the fact that 
https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/ActiveRepairService.java#L189
 calls {code}ss.getLocalRanges(keyspaceName){code} everytime and that it takes 
more than 99% of the time. This call takes 600ms when there is no load on the 
cluster and more if there is. So for 10k ranges, you can imagine that it takes 
at least 1.5 hours just to compute ranges. 

Underneath it calls 
[ReplicationStrategy.getAddressRanges|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L170]
 which can get pretty inefficient.

*ss.getLocalRanges(keyspaceName)* should be cached to avoid having to spend 
hours on it.


> Improve Repair performance
> --------------------------
>
>                 Key: CASSANDRA-11933
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11933
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Cyril Scetbon
>
> During  a full repair on a ~ 60 nodes cluster, I've been able to see that 
> this stage can be significant (up to 60 percent of the whole time) :
> https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/StorageService.java#L2983-L2997
> It's merely caused by the fact that 
> https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/ActiveRepairService.java#L189
>  calls {code}ss.getLocalRanges(keyspaceName){code} everytime and that it 
> takes more than 99% of the time. This call takes 600ms when there is no load 
> on the cluster and more if there is. So for 10k ranges, you can imagine that 
> it takes at least 1.5 hours just to compute ranges. 
> Underneath it calls 
> [ReplicationStrategy.getAddressRanges|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L170]
>  which can get pretty inefficient.
> *ss.getLocalRanges(keyspaceName)* should be cached to avoid having to spend 
> hours on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to