[ 
https://issues.apache.org/jira/browse/CASSANDRA-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255014#comment-15255014
 ] 

Paulo Motta commented on CASSANDRA-3486:
----------------------------------------

Attaching [preliminary 
patch|https://github.com/pauloricardomg/cassandra/tree/3486-trunk] in case 
anyone wants to have a look or give feedback before the review-ready version

*Current state*

* Add {{nodetool repair --list}} to list ongoing repair jobs (parent repair 
sessions) in the local node
* Add {{nodetool repair --abort <jobId>}} and {{nodetool repair --abort-all}} 
to abort a specific or all jobs
* Any participant can abort the repair job:
** When a participant receives an abort request, it sends an abort message to 
the coordinator and abort its local tasks
** When a coordinator receives an abort message or abort request, it sends an 
abort message to all participants and abort its local tasks, failing the repair 
job
* Add abort support to {{StreamResultFuture}} and {{StreamSession}}
* Refactor {{ActiveRepairService}} and {{RepairMessageVerbHandler}}
* Add [dtests|https://github.com/pauloricardomg/cassandra-dtest/tree/3486] to 
abort repair on coordinator and participants on different phases (validation, 
sync, anticompaction)
* Fix races and leaks found during dtests

*Limitations and next steps*

While compactions have abort/stop support via 
{{CompactionManager.stopCompactionById}},
we cannot guarantee it's going to be aborted during a repair abortion because 
it's abort handler ({{Holder}}) is only registered during iteration via the 
{{CompactionIterator}}, so if we stop the compaction before that the task is 
not aborted, and will execute even if it's parent repair session was aborted. 
Furthermore, an anti-compaction is split into multiple subcompactions, so this 
method only stop the currently running subcompaction.

In order to overcome this, I aborted the compaction task {{Future}} directly, 
which causes the task thread to be interrupted, so I check for 
{{Thread.currentThread.isInterrupted()}} during iteration and throw a 
{{CompactionInterruptedException}} if this is true, causing the compaction to 
be aborted (by brute force).

However this is not very safe, because it can generate a 
{{ClosedByInterruptException}} if we're blocked on an I/O operation, and we 
currently treat any {{IOException}} as a corrupt sstable. Furthermore, an 
interrupted thread  is not able to abort the transaction when getting a 
{{CompactionInterruptedException}}. In order to solve this we could special 
case interruptions in many places (readers, transaction aborting, etc) but even 
this wouldn't guarantee we're safe so this is probably a bad smell.

A cleaner option that I will be doing in the next iteration is to associate a 
{{CompactionHolder}} with a {{ListenableFuture}} as soon as the anti-compaction 
or validation is submitted, so we can abort it safely without interrupting the 
compaction thread.

> Node Tool command to stop repair
> --------------------------------
>
>                 Key: CASSANDRA-3486
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3486
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>         Environment: JVM
>            Reporter: Vijay
>            Assignee: Paulo Motta
>            Priority: Minor
>              Labels: repair
>             Fix For: 2.1.x
>
>         Attachments: 0001-stop-repair-3583.patch
>
>
> After CASSANDRA-1740, If the validation compaction is stopped then the repair 
> will hang. This ticket will allow users to kill the original repair.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to