[
https://issues.apache.org/jira/browse/CASSANDRA-12901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15668558#comment-15668558
]
Yuki Morishita commented on CASSANDRA-12901:
--------------------------------------------
You are right that when the remote streaming node (the node that receives
SyncRequest message) dies, coordinator is never notified for the failure and
repair hangs. I'd rather make it not hang so bringing back FD would be fine
regarding the false positive it brings.
bq. but streaming fails befores the FD detects the node is down, so an
anti-compaction request is being sent to the failed replica
Hmm, yeah looks like this can happen. Looks like we need to mark failed node
and eliminate from anti-compacting nodes rather than relying on FD alive check
in {{AntiCompactionTask}}.
> Repair may hang if node dies during sync
> ----------------------------------------
>
> Key: CASSANDRA-12901
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12901
> Project: Cassandra
> Issue Type: Bug
> Components: Streaming and Messaging
> Reporter: Paulo Motta
> Assignee: Paulo Motta
>
> Since the repair coordinator unregisters from the FD after validation
> (CASSANDRA-3569), if the initiator of a RemoteSyncTask fails, the coordinator
> will never know the sync task failed and hang.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)