[jira] [Commented] (CASSANDRA-3486) Node Tool command to stop repair

Paulo Motta (JIRA) Tue, 26 Apr 2016 14:57:12 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259008#comment-15259008
 ]


Paulo Motta commented on CASSANDRA-3486:
----------------------------------------

Thanks for the feedback [~nickmbailey]. See follow-up below:

bq. If the abort is initiated on the coordinator can we return the 
success/failure of the attempt to abort on the participants as well? And vice 
versa? Similarly for the list of results when aborting all jobs.

We could, but in this initial implementation I opted to take an optimistic 
approach to keep the protocol simple and non-blocking. If for some reason there 
is a network partition and "orphaned" sessions keep running, you can always 
abort them individually later. Do you think a blocking + timeout approach would 
be preferable?

bq. Can we make sure we are testing the case where for whatever reason a 
coordinator or participant receives an abort for a repair it doesn't know about?

Sure. One of the changes of this patch that I forgot to mention is that all 
messages are validated against the repair session UUID, so if a node receives a 
message from a repair it doesn't know about it logs and ignores it.

bq. Since we are now tracking repairs by uuid like this, can we expose a 
progress API outside of the jmx notification process? An mbean for retrieving 
the progress/status of a repair job by uuid?

We could, but we currently don't keep state or progress information in the 
repair session. Furthermore we clear repair session information as soon as it's 
finished, so the list repairs stub only list currently active repairs. So we 
would need to maintain progress status and provide some way to clear repair 
information after some time. 

I personally think we should go this route of making repair more stateful, what 
will not only improve monitoring but will also allow us to break up a repair 
job into more decoupled subtasks, simplifying the single chain of futures we 
have today, which can be quite complex to understand and error-prone.

> Node Tool command to stop repair
> --------------------------------
>
>                 Key: CASSANDRA-3486
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3486
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>         Environment: JVM
>            Reporter: Vijay
>            Assignee: Paulo Motta
>            Priority: Minor
>              Labels: repair
>             Fix For: 2.1.x
>
>         Attachments: 0001-stop-repair-3583.patch
>
>
> After CASSANDRA-1740, If the validation compaction is stopped then the repair 
> will hang. This ticket will allow users to kill the original repair.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-3486) Node Tool command to stop repair

Reply via email to