[
https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13661478#comment-13661478
]
Charlie Groves commented on CASSANDRA-5351:
-------------------------------------------
I've been looking at implementing this, and either I'm not understanding how it
works or it needs an extra wrinkle to keep from streaming a lot of data around.
Given nodes 1, 2 and 3, each of which are replicas for the same range of keys,
my understanding is that this style of repair would play out like this:
# Run repair on 1 and it's just like current repair: 1 streams the sections of
sstables for its divergent ranges to 2 and 3 and they stream their versions of
the divergent ranges back to 1
# 1 marks its initial sstables and the ones it received as repaired
# Run repair on 2, it streams back and forth with 3 in the same fashion as in
step 1. Node 1 doesn't include the sstables it repaired, so the merkle trees
are mostly different and 1 and 2 stream the majority of their unrepaired
sstables to each other
# 2 marks its initial sstables and the ones it received repaired
# Run repair on 3, and neither 1 nor 2 send their repaired sstables. All the
trees are quite divergent, so both 1 and 2 send their unrepaired sstables to 3
and 3 sends its to 1 and 2.
If you add more replicas, you stream the majority of the sstables for each
repaired node until you move to a node that isn't replicating the same range.
Am I missing something? It seems like the amount of data streamed would knock
out much of the benefit of not reading the repaired data.
If the above is the case, I was thinking it could be fixed by adding a
"generation" to repairs. You supply a generation number to the repair command
and all sstables repaired in that run are marked as repaired in that
generation. The generation is sent to all the neighbor nodes requesting repairs
from them, and they build their merkle trees using any unrepaired ranges and
repaired ranges at that generation or higher. Compaction would create new
sstables at the highest generation number of its source sstables. Once you've
repaired all the way around the ring, you'd increment the generation number.
We could even use the repairedAt timestamp as the generation: don't give the
first node repaired in the ring for this round a generation, and it returns its
timestamp when it's done. Pass that timestamp as the generation around the
ring, and they're all on the same generation afterwards.
"--include-previously-repaired" could be implemented as repairing with
generation 0.
> Avoid repairing already-repaired data by default
> ------------------------------------------------
>
> Key: CASSANDRA-5351
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5351
> Project: Cassandra
> Issue Type: Task
> Components: Core
> Reporter: Jonathan Ellis
> Labels: repair
> Fix For: 2.0
>
>
> Repair has always built its merkle tree from all the data in a columnfamily,
> which is guaranteed to work but is inefficient.
> We can improve this by remembering which sstables have already been
> successfully repaired, and only repairing sstables new since the last repair.
> (This automatically makes CASSANDRA-3362 much less of a problem too.)
> The tricky part is, compaction will (if not taught otherwise) mix repaired
> data together with non-repaired. So we should segregate unrepaired sstables
> from the repaired ones.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira