[
https://issues.apache.org/jira/browse/CASSANDRA-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15272689#comment-15272689
]
Jeremiah Jordan edited comment on CASSANDRA-8911 at 5/5/16 5:35 PM:
--------------------------------------------------------------------
bq. not sure here, would be nice to be able to prioritise a given range for a
table, say we lose an sstable for example, automatically repairing the range
that sstable covered immediately.
bq. It should probably be just a single thread, one table after the other. And
then maybe having a priority queue or something with ranges to repair
immediately, wdyt Paulo Motta? This prio queue thing might be a bit of gold
plating that we could do later.
This discussion should probably be on CASSANDRA-10070?
was (Author: jjordan):
bq. not sure here, would be nice to be able to prioritise a given range for a
table, say we lose an sstable for example, automatically repairing the range
that sstable covered immediately.
bq. It should probably be just a single thread, one table after the other. And
then maybe having a priority queue or something with ranges to repair
immediately, wdyt Paulo Motta? This prio queue thing might be a bit of gold
plating that we could do later.
This discussion should probably be on the other ticket?
> Consider Mutation-based Repairs
> -------------------------------
>
> Key: CASSANDRA-8911
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8911
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Tyler Hobbs
> Assignee: Marcus Eriksson
> Fix For: 3.x
>
>
> We should consider a mutation-based repair to replace the existing streaming
> repair. While we're at it, we could do away with a lot of the complexity
> around merkle trees.
> I have not planned this out in detail, but here's roughly what I'm thinking:
> * Instead of building an entire merkle tree up front, just send the "leaves"
> one-by-one. Instead of dealing with token ranges, make the leaves primary
> key ranges. The PK ranges would need to be contiguous, so that the start of
> each range would match the end of the previous range. (The first and last
> leaves would need to be open-ended on one end of the PK range.) This would be
> similar to doing a read with paging.
> * Once one page of data is read, compute a hash of it and send it to the
> other replicas along with the PK range that it covers and a row count.
> * When the replicas receive the hash, the perform a read over the same PK
> range (using a LIMIT of the row count + 1) and compare hashes (unless the row
> counts don't match, in which case this can be skipped).
> * If there is a mismatch, the replica will send a mutation covering that
> page's worth of data (ignoring the row count this time) to the source node.
> Here are the advantages that I can think of:
> * With the current repair behavior of streaming, vnode-enabled clusters may
> need to stream hundreds of small SSTables. This results in increased compact
> ion load on the receiving node. With the mutation-based approach, memtables
> would naturally merge these.
> * It's simple to throttle. For example, you could give a number of rows/sec
> that should be repaired.
> * It's easy to see what PK range has been repaired so far. This could make
> it simpler to resume a repair that fails midway.
> * Inconsistencies start to be repaired almost right away.
> * Less special code \(?\)
> * Wide partitions are no longer a problem.
> There are a few problems I can think of:
> * Counters. I don't know if this can be made safe, or if they need to be
> skipped.
> * To support incremental repair, we need to be able to read from only
> repaired sstables. Probably not too difficult to do.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)