[ 
https://issues.apache.org/jira/browse/CASSANDRA-9097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuki Morishita updated CASSANDRA-9097:
--------------------------------------
    Attachment: 0001-Remove-parent-session-on-remotes-when-repair-fails.patch

When repair session fails, we are only removing coordinator's parent repair 
session.
Currently, parent repair session is only removed when exception is thrown from 
ANTIENTROPY_STAGE, but validation and streaming happen on separate threads so 
we have to clean them separately.

I introduced new CleanupMessage and only send it to the nodes that pass version 
check. So adding new message should be fine.

Note that this is not be an issue for 2.2+, since we are sending succeeded 
repair ranges, though we need to add new message to trunk for compatibility.

I will (try to) write dtest to cover this scenario, though I submit patch first 
for the review.

> Repeated incremental nodetool repair results in failed repairs due to running 
> anticompaction
> --------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-9097
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9097
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Gustav Munkby
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 2.2 beta 1, 2.1.6
>
>         Attachments: 
> 0001-Remove-parent-session-on-remotes-when-repair-fails.patch, 
> 0001-Wait-for-anticompaction-to-finish.patch
>
>
> I'm trying to synchronize incremental repairs over multiple nodes in a 
> Cassandra cluster, and it does not seem to easily achievable.
> In principle, the process iterates through the nodes of the cluster and 
> performs `nodetool -h $NODE repair --incremental`, but that sometimes fails 
> on subsequent nodes. The reason for failing seems to be that the repair 
> returns as soon as the repair and the _local_ anticompaction has completed, 
> but does not guarantee that remote anticompactions are complete. If I 
> subsequently try to issue another repair command, they fail to start (and 
> terminate with failure after about one minute). It usually isn't a problem, 
> as the local anticompaction typically involves as much (or more) data as the 
> remote ones, but sometimes not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to