[
https://issues.apache.org/jira/browse/CASSANDRA-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255014#comment-15255014
]
Paulo Motta commented on CASSANDRA-3486:
----------------------------------------
Attaching [preliminary
patch|https://github.com/pauloricardomg/cassandra/tree/3486-trunk] in case
anyone wants to have a look or give feedback before the review-ready version
*Current state*
* Add {{nodetool repair --list}} to list ongoing repair jobs (parent repair
sessions) in the local node
* Add {{nodetool repair --abort <jobId>}} and {{nodetool repair --abort-all}}
to abort a specific or all jobs
* Any participant can abort the repair job:
** When a participant receives an abort request, it sends an abort message to
the coordinator and abort its local tasks
** When a coordinator receives an abort message or abort request, it sends an
abort message to all participants and abort its local tasks, failing the repair
job
* Add abort support to {{StreamResultFuture}} and {{StreamSession}}
* Refactor {{ActiveRepairService}} and {{RepairMessageVerbHandler}}
* Add [dtests|https://github.com/pauloricardomg/cassandra-dtest/tree/3486] to
abort repair on coordinator and participants on different phases (validation,
sync, anticompaction)
* Fix races and leaks found during dtests
*Limitations and next steps*
While compactions have abort/stop support via
{{CompactionManager.stopCompactionById}},
we cannot guarantee it's going to be aborted during a repair abortion because
it's abort handler ({{Holder}}) is only registered during iteration via the
{{CompactionIterator}}, so if we stop the compaction before that the task is
not aborted, and will execute even if it's parent repair session was aborted.
Furthermore, an anti-compaction is split into multiple subcompactions, so this
method only stop the currently running subcompaction.
In order to overcome this, I aborted the compaction task {{Future}} directly,
which causes the task thread to be interrupted, so I check for
{{Thread.currentThread.isInterrupted()}} during iteration and throw a
{{CompactionInterruptedException}} if this is true, causing the compaction to
be aborted (by brute force).
However this is not very safe, because it can generate a
{{ClosedByInterruptException}} if we're blocked on an I/O operation, and we
currently treat any {{IOException}} as a corrupt sstable. Furthermore, an
interrupted thread is not able to abort the transaction when getting a
{{CompactionInterruptedException}}. In order to solve this we could special
case interruptions in many places (readers, transaction aborting, etc) but even
this wouldn't guarantee we're safe so this is probably a bad smell.
A cleaner option that I will be doing in the next iteration is to associate a
{{CompactionHolder}} with a {{ListenableFuture}} as soon as the anti-compaction
or validation is submitted, so we can abort it safely without interrupting the
compaction thread.
> Node Tool command to stop repair
> --------------------------------
>
> Key: CASSANDRA-3486
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3486
> Project: Cassandra
> Issue Type: Improvement
> Components: Tools
> Environment: JVM
> Reporter: Vijay
> Assignee: Paulo Motta
> Priority: Minor
> Labels: repair
> Fix For: 2.1.x
>
> Attachments: 0001-stop-repair-3583.patch
>
>
> After CASSANDRA-1740, If the validation compaction is stopped then the repair
> will hang. This ticket will allow users to kill the original repair.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)