[ 
https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14980079#comment-14980079
 ] 

Marcus Olsson commented on CASSANDRA-10070:
-------------------------------------------

Just to clarify, the automatic scheduling is done on a node level. The way it 
distributes is by "competing" with the other nodes with regards to who has the 
highest need for a repair and then uses a CAS lock to obtain the right to run a 
repair. So the repair process would continue during upgrade, but I assume it 
would fail as it is right now and that the repair job would be retried. The 
problem here is that this job would try to run until it succeeded since it has 
the highest priority, even if there are other repair jobs that could run (e.g. 
if only a part of the cluster was upgraded).

To allow repairs during an upgrade scenario I think we need to have both 
CASSANDRA-7530 & CASSANDRA-8110 in place.
Until then I see two options:
* Make it possible to "pause" all repair scheduling, e.g. during upgrade 
scenarios.
* Make the repair job recognize that it cannot run at this time and allow 
another repair job to run instead.

I wouldn't mind implementing both options, since there might be scenarios when 
both are needed, even if we can repair between versions.

> Automatic repair scheduling
> ---------------------------
>
>                 Key: CASSANDRA-10070
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10070
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Marcus Olsson
>            Assignee: Marcus Olsson
>            Priority: Minor
>             Fix For: 3.x
>
>
> Scheduling and running repairs in a Cassandra cluster is most often a 
> required task, but this can both be hard for new users and it also requires a 
> bit of manual configuration. There are good tools out there that can be used 
> to simplify things, but wouldn't this be a good feature to have inside of 
> Cassandra? To automatically schedule and run repairs, so that when you start 
> up your cluster it basically maintains itself in terms of normal 
> anti-entropy, with the possibility for manual configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to