[ 
https://issues.apache.org/jira/browse/CASSANDRA-12901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15667927#comment-15667927
 ] 

Paulo Motta commented on CASSANDRA-12901:
-----------------------------------------

The dtest is testing two scenarios:
1 - failed replica syncing to other participant
2 - failed replica syncing from coordinator

But there was an error in the original dtest which was making it test only case 
1. When I fixed case 2, the repair session is failing due to streaming 
breaking, but streaming fails befores the FD detects the node is down, so an 
anti-compaction request is being sent to the failed replica, which never 
replies, making repair hang again. So, this just uncovered another bug which is 
that if a node fails in the middle of anti-compaction repair will also hang. I 
will also address this in this same ticket, but will keep it as PA to get 
initial feedback.

> Repair may hang if node dies during sync
> ----------------------------------------
>
>                 Key: CASSANDRA-12901
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12901
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>            Reporter: Paulo Motta
>            Assignee: Paulo Motta
>
> Since the repair coordinator unregisters from the FD after validation 
> (CASSANDRA-3569), if the initiator of a RemoteSyncTask fails, the coordinator 
> will never know the sync task failed and hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to