[jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas

Blake Eggleston (JIRA) Wed, 24 Aug 2016 17:56:29 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15436054#comment-15436054
 ]


Blake Eggleston commented on CASSANDRA-9143:
--------------------------------------------

bq.  it sounds slightly different from the original problem description

Both are really manifestations of the same root problem: incremental repair 
behaves unpredictably because data being repaired isn't kept separate from 
unrepaired data during repair. Maybe we should expand the problem description, 
and close CASSANDRA-8858 as a dupe?

bq. How do you plan to perform anti-compaction up-front?

We’d have to be optimistic and anti-compact all the tables and ranges we’re 
going to be repairing prior to validation. Obviously, failed ranges would have 
to be re-anticompacted back into unrepaired. The cost of this would have to be 
compared to the higher network io caused by the current state of things, and 
the frequency of failed ranges.

bq. I propose we start with the original idea of adding a 2PC to 
anti-compaction as suggested in the ticket description and perhaps on the top 
of that pursue anti-compaction checkpoints/hints in separate ticket

This only solves part of the problem. We’re still leaking repaired data during 
compaction. I think it makes sense to talk about the over arching problem of 
keeping repaired and unrepaired data separate first. We can still handle each 
of the cases separately if it makes sense to.

> Improving consistency of repairAt field across replicas 
> --------------------------------------------------------
>
>                 Key: CASSANDRA-9143
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9143
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Assignee: Blake Eggleston
>            Priority: Minor
>
> We currently send an anticompaction request to all replicas. During this, a 
> node will split stables and mark the appropriate ones repaired. 
> The problem is that this could fail on some replicas due to many reasons 
> leading to problems in the next repair. 
> This is what I am suggesting to improve it. 
> 1) Send anticompaction request to all replicas. This can be done at session 
> level. 
> 2) During anticompaction, stables are split but not marked repaired. 
> 3) When we get positive ack from all replicas, coordinator will send another 
> message called markRepaired. 
> 4) On getting this message, replicas will mark the appropriate stables as 
> repaired. 
> This will reduce the window of failure. We can also think of "hinting" 
> markRepaired message if required. 
> Also the stables which are streaming can be marked as repaired like it is 
> done now. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas

Reply via email to