[
https://issues.apache.org/jira/browse/CASSANDRA-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marcus Eriksson reassigned CASSANDRA-8911:
------------------------------------------
Assignee: Marcus Eriksson
> Consider Mutation-based Repairs
> -------------------------------
>
> Key: CASSANDRA-8911
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8911
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Tyler Hobbs
> Assignee: Marcus Eriksson
> Fix For: 3.x
>
>
> We should consider a mutation-based repair to replace the existing streaming
> repair. While we're at it, we could do away with a lot of the complexity
> around merkle trees.
> I have not planned this out in detail, but here's roughly what I'm thinking:
> * Instead of building an entire merkle tree up front, just send the "leaves"
> one-by-one. Instead of dealing with token ranges, make the leaves primary
> key ranges. The PK ranges would need to be contiguous, so that the start of
> each range would match the end of the previous range. (The first and last
> leaves would need to be open-ended on one end of the PK range.) This would be
> similar to doing a read with paging.
> * Once one page of data is read, compute a hash of it and send it to the
> other replicas along with the PK range that it covers and a row count.
> * When the replicas receive the hash, the perform a read over the same PK
> range (using a LIMIT of the row count + 1) and compare hashes (unless the row
> counts don't match, in which case this can be skipped).
> * If there is a mismatch, the replica will send a mutation covering that
> page's worth of data (ignoring the row count this time) to the source node.
> Here are the advantages that I can think of:
> * With the current repair behavior of streaming, vnode-enabled clusters may
> need to stream hundreds of small SSTables. This results in increased compact
> ion load on the receiving node. With the mutation-based approach, memtables
> would naturally merge these.
> * It's simple to throttle. For example, you could give a number of rows/sec
> that should be repaired.
> * It's easy to see what PK range has been repaired so far. This could make
> it simpler to resume a repair that fails midway.
> * Inconsistencies start to be repaired almost right away.
> * Less special code \(?\)
> * Wide partitions are no longer a problem.
> There are a few problems I can think of:
> * Counters. I don't know if this can be made safe, or if they need to be
> skipped.
> * To support incremental repair, we need to be able to read from only
> repaired sstables. Probably not too difficult to do.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)