[jira] [Commented] (CASSANDRA-5351) Avoid repairing already-repaired data by default

Jason Brown (JIRA) Tue, 15 Oct 2013 05:29:48 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795120#comment-13795120
 ]


Jason Brown commented on CASSANDRA-5351:
----------------------------------------

Interesting ideas here. However, here's some problems off the top of my head 
that need to be addressed (in no order):

* Nodes that are fully replaced (see: Netflix running in the cloud). When a 
node is replaced, we bootstrap the node by streaming data from closest peers 
(usually) in the local DC. The new node would not have anti-compacted sstables, 
as it's never had a chance to repair. I'm not sure if bootstrapping data can be 
considered anti-compacted through cummutativity; it might be true, but I'd need 
to think about it more. Asuuming not, when this new node is involved in any 
repair, it would generate a different MT than it's already repaired peers, and 
thus all hell would break loose streaming already repaired data to every node 
involved in the repair, worse than today's repair (think streaming TBs of data 
across multiple amazon datacenters). If we can prove that the new node's data 
is commutatively repaired just by bootstrap, then this is not a problem as 
such. Note this also affects move (to a lesser degree) and rebuild.
* Consider nodes A, B, and C. If nodes A and B successfully repair, but C fails 
to repair with them (due to partitioning, app crash, etc) during the repair. C 
is forced to do an -ipr repair as A and B have already anti-compacted and that 
is the only way C will be able to repair against A and B. 
* If the operator chooses to cacncel the repair, we are left at an indetermant 
state wrt which node has successfully completed repairs with another (similar 
to last point).
* Local DC repair vs. global is largely incompatible with this. Looks like you 
will get one shot with each sstable's range for repair, so if you choose do 
local DC repair with an ssttable, you are forced to do -ipr if you later want 
to globally repair.

Note that these problems are magnified immensely when you run in multiple 
datacenters, especially datacenters separated by great distances.

While none of these situations is unresolvable, it seems that there are many 
non-obvious ways into which we can get into a non-deterministic state that 
operators will see either tons of data being streamed due to different 
anti-compaction points being different or will be forced to run -ipr without an 
easily understood reason. I already see operators terminate repair jobs because 
"they hang" or "take too long", for better or worse (mostly worse). At that 
point, the operator is pretty much required to do an -ipr repair, which gets us 
back into the same situation we are in today, but with more confusion and 
possibly using -ipr as the default.

It would probably be good to run -ipr as a best practice anyways every n 
days/weeks/months, but I worry about the very non-obvious edge cases this 
introduces and the possiblity that operators will simply fall back to using 
-ipr whenever something goes bump or doesn't make sense.

Thanks for listening.

> Avoid repairing already-repaired data by default
> ------------------------------------------------
>
>                 Key: CASSANDRA-5351
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5351
>             Project: Cassandra
>          Issue Type: Task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Lyuben Todorov
>              Labels: repair
>             Fix For: 2.1
>
>
> Repair has always built its merkle tree from all the data in a columnfamily, 
> which is guaranteed to work but is inefficient.
> We can improve this by remembering which sstables have already been 
> successfully repaired, and only repairing sstables new since the last repair. 
>  (This automatically makes CASSANDRA-3362 much less of a problem too.)
> The tricky part is, compaction will (if not taught otherwise) mix repaired 
> data together with non-repaired.  So we should segregate unrepaired sstables 
> from the repaired ones.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-5351) Avoid repairing already-repaired data by default

Reply via email to