[ 
https://issues.apache.org/jira/browse/CASSANDRA-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15272689#comment-15272689
 ] 

Jeremiah Jordan edited comment on CASSANDRA-8911 at 5/5/16 5:35 PM:
--------------------------------------------------------------------

bq. not sure here, would be nice to be able to prioritise a given range for a 
table, say we lose an sstable for example, automatically repairing the range 
that sstable covered immediately.
bq. It should probably be just a single thread, one table after the other. And 
then maybe having a priority queue or something with ranges to repair 
immediately, wdyt Paulo Motta? This prio queue thing might be a bit of gold 
plating that we could do later.

This discussion should probably be on CASSANDRA-10070?


was (Author: jjordan):
bq. not sure here, would be nice to be able to prioritise a given range for a 
table, say we lose an sstable for example, automatically repairing the range 
that sstable covered immediately.
bq. It should probably be just a single thread, one table after the other. And 
then maybe having a priority queue or something with ranges to repair 
immediately, wdyt Paulo Motta? This prio queue thing might be a bit of gold 
plating that we could do later.

This discussion should probably be on the other ticket?

> Consider Mutation-based Repairs
> -------------------------------
>
>                 Key: CASSANDRA-8911
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8911
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Tyler Hobbs
>            Assignee: Marcus Eriksson
>             Fix For: 3.x
>
>
> We should consider a mutation-based repair to replace the existing streaming 
> repair.  While we're at it, we could do away with a lot of the complexity 
> around merkle trees.
> I have not planned this out in detail, but here's roughly what I'm thinking:
>  * Instead of building an entire merkle tree up front, just send the "leaves" 
> one-by-one.  Instead of dealing with token ranges, make the leaves primary 
> key ranges.  The PK ranges would need to be contiguous, so that the start of 
> each range would match the end of the previous range. (The first and last 
> leaves would need to be open-ended on one end of the PK range.) This would be 
> similar to doing a read with paging.
>  * Once one page of data is read, compute a hash of it and send it to the 
> other replicas along with the PK range that it covers and a row count.
>  * When the replicas receive the hash, the perform a read over the same PK 
> range (using a LIMIT of the row count + 1) and compare hashes (unless the row 
> counts don't match, in which case this can be skipped).
>  * If there is a mismatch, the replica will send a mutation covering that 
> page's worth of data (ignoring the row count this time) to the source node.
> Here are the advantages that I can think of:
>  * With the current repair behavior of streaming, vnode-enabled clusters may 
> need to stream hundreds of small SSTables.  This results in increased compact
> ion load on the receiving node.  With the mutation-based approach, memtables 
> would naturally merge these.
>  * It's simple to throttle.  For example, you could give a number of rows/sec 
> that should be repaired.
>  * It's easy to see what PK range has been repaired so far.  This could make 
> it simpler to resume a repair that fails midway.
>  * Inconsistencies start to be repaired almost right away.
>  * Less special code \(?\)
>  * Wide partitions are no longer a problem.
> There are a few problems I can think of:
>  * Counters.  I don't know if this can be made safe, or if they need to be 
> skipped.
>  * To support incremental repair, we need to be able to read from only 
> repaired sstables.  Probably not too difficult to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to