[ 
https://issues.apache.org/jira/browse/CASSANDRA-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

C. Scott Andreas updated CASSANDRA-8911:
----------------------------------------
    Component/s: Repair

> Consider Mutation-based Repairs
> -------------------------------
>
>                 Key: CASSANDRA-8911
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8911
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Repair
>            Reporter: Tyler Hobbs
>            Priority: Major
>              Labels: repair
>             Fix For: 4.x
>
>
> We should consider a mutation-based repair to replace the existing streaming 
> repair.  While we're at it, we could do away with a lot of the complexity 
> around merkle trees.
> I have not planned this out in detail, but here's roughly what I'm thinking:
>  * Instead of building an entire merkle tree up front, just send the "leaves" 
> one-by-one.  Instead of dealing with token ranges, make the leaves primary 
> key ranges.  The PK ranges would need to be contiguous, so that the start of 
> each range would match the end of the previous range. (The first and last 
> leaves would need to be open-ended on one end of the PK range.) This would be 
> similar to doing a read with paging.
>  * Once one page of data is read, compute a hash of it and send it to the 
> other replicas along with the PK range that it covers and a row count.
>  * When the replicas receive the hash, the perform a read over the same PK 
> range (using a LIMIT of the row count + 1) and compare hashes (unless the row 
> counts don't match, in which case this can be skipped).
>  * If there is a mismatch, the replica will send a mutation covering that 
> page's worth of data (ignoring the row count this time) to the source node.
> Here are the advantages that I can think of:
>  * With the current repair behavior of streaming, vnode-enabled clusters may 
> need to stream hundreds of small SSTables.  This results in increased compact
> ion load on the receiving node.  With the mutation-based approach, memtables 
> would naturally merge these.
>  * It's simple to throttle.  For example, you could give a number of rows/sec 
> that should be repaired.
>  * It's easy to see what PK range has been repaired so far.  This could make 
> it simpler to resume a repair that fails midway.
>  * Inconsistencies start to be repaired almost right away.
>  * Less special code \(?\)
>  * Wide partitions are no longer a problem.
> There are a few problems I can think of:
>  * Counters.  I don't know if this can be made safe, or if they need to be 
> skipped.
>  * To support incremental repair, we need to be able to read from only 
> repaired sstables.  Probably not too difficult to do.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to