[
https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15436054#comment-15436054
]
Blake Eggleston commented on CASSANDRA-9143:
--------------------------------------------
bq. it sounds slightly different from the original problem description
Both are really manifestations of the same root problem: incremental repair
behaves unpredictably because data being repaired isn't kept separate from
unrepaired data during repair. Maybe we should expand the problem description,
and close CASSANDRA-8858 as a dupe?
bq. How do you plan to perform anti-compaction up-front?
We’d have to be optimistic and anti-compact all the tables and ranges we’re
going to be repairing prior to validation. Obviously, failed ranges would have
to be re-anticompacted back into unrepaired. The cost of this would have to be
compared to the higher network io caused by the current state of things, and
the frequency of failed ranges.
bq. I propose we start with the original idea of adding a 2PC to
anti-compaction as suggested in the ticket description and perhaps on the top
of that pursue anti-compaction checkpoints/hints in separate ticket
This only solves part of the problem. We’re still leaking repaired data during
compaction. I think it makes sense to talk about the over arching problem of
keeping repaired and unrepaired data separate first. We can still handle each
of the cases separately if it makes sense to.
> Improving consistency of repairAt field across replicas
> --------------------------------------------------------
>
> Key: CASSANDRA-9143
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9143
> Project: Cassandra
> Issue Type: Improvement
> Reporter: sankalp kohli
> Assignee: Blake Eggleston
> Priority: Minor
>
> We currently send an anticompaction request to all replicas. During this, a
> node will split stables and mark the appropriate ones repaired.
> The problem is that this could fail on some replicas due to many reasons
> leading to problems in the next repair.
> This is what I am suggesting to improve it.
> 1) Send anticompaction request to all replicas. This can be done at session
> level.
> 2) During anticompaction, stables are split but not marked repaired.
> 3) When we get positive ack from all replicas, coordinator will send another
> message called markRepaired.
> 4) On getting this message, replicas will mark the appropriate stables as
> repaired.
> This will reduce the window of failure. We can also think of "hinting"
> markRepaired message if required.
> Also the stables which are streaming can be marked as repaired like it is
> done now.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)