[ 
https://issues.apache.org/jira/browse/CASSANDRA-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15663961#comment-15663961
 ] 

Paulo Motta commented on CASSANDRA-10446:
-----------------------------------------

bq. Which means we obviously cannot mark anything "repaired" if some node was 
down. This seems to be what the last patch is doing, but some of the 
discussions above seems to suggest this could be done differently in the 
future, after CASSANDRA-9143 in particular.  Did I misread those discussions or 
did I miss something more fundamental?

When you trigger a repair command (parent repair session) it will trigger many 
(child) repair sessions, typically one for each vnode subrange. In the end of 
the parent repair session, it will anti-compact only the ranges of successful 
child repair sessions, since a subset of the child repair sessions may have 
failed due to node failures or whatever, and so their ranges cannot be marked 
as repaired. Likewise, when you trigger a repair {{--force}}, only a subset of 
the child repair sessions may have down nodes, so we can still mark ranges of 
successful child repair sessions as repaired (the ones where all nodes were 
up), and this is what the patch is currently doing and will be kept after 
CASSANDRA-9143.

What was brought here and might have confused things a bit is that in both 
cases (with and without {{--force}}), streamed sstables are always marked as 
repaired, what may cause problems in some edge failure scenarios (if a repair 
session fails after part of the syncs are completed), and this limitation in 
particular will be addressed on CASSANDRA-9143.

Does this clarify your concerns or is there something else we may be missing?

> Run repair with down replicas
> -----------------------------
>
>                 Key: CASSANDRA-10446
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10446
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Assignee: Blake Eggleston
>            Priority: Minor
>             Fix For: 4.0
>
>
> We should have an option of running repair when replicas are down. We can 
> call it -force.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to