[ 
https://issues.apache.org/jira/browse/CASSANDRA-12901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15668558#comment-15668558
 ] 

Yuki Morishita commented on CASSANDRA-12901:
--------------------------------------------

You are right that when the remote streaming node (the node that receives 
SyncRequest message) dies, coordinator is never notified for the failure and 
repair hangs. I'd rather make it not hang so bringing back FD would be fine 
regarding the false positive it brings.

bq. but streaming fails befores the FD detects the node is down, so an 
anti-compaction request is being sent to the failed replica

Hmm, yeah looks like this can happen. Looks like we need to mark failed node 
and eliminate from anti-compacting nodes rather than relying on FD alive check 
in {{AntiCompactionTask}}.


> Repair may hang if node dies during sync
> ----------------------------------------
>
>                 Key: CASSANDRA-12901
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12901
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>            Reporter: Paulo Motta
>            Assignee: Paulo Motta
>
> Since the repair coordinator unregisters from the FD after validation 
> (CASSANDRA-3569), if the initiator of a RemoteSyncTask fails, the coordinator 
> will never know the sync task failed and hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to