[
https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699687#comment-13699687
]
Jeremiah Jordan commented on CASSANDRA-5351:
--------------------------------------------
Anti compaction sounds like it could work.
Then you really do just need an "am I repaired flag", because during repair you
anti-compact into "repaired" and "not repaired" data.
So something like:
1. Calculate merkle trees, anti compacting each sstable into "data being
repaired" and "data not being repaired" tmp sstables during the process. Set a
flag in the "data being repaired" sstables to show them as repaired.
2. Perform merkle exchange/streaming, flag tmp sstables coming in from
streaming as repaired.
3. When the repair is done, convert all tmp sstables into real ones, and delete
originals
sstables involved in the repair would be marked "already compacting" so they
won't participate in compaction during the repair.
Since you don't promote from tmp to real until the repair complete's
successfully, if the node dies in the middle of the repair, all the tmp
sstables will just be removed at startup.
Then only compact like sstables, so there will be two sets of sstables "fully
repaired" and "not repaired at all".
> Avoid repairing already-repaired data by default
> ------------------------------------------------
>
> Key: CASSANDRA-5351
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5351
> Project: Cassandra
> Issue Type: Task
> Components: Core
> Reporter: Jonathan Ellis
> Labels: repair
> Fix For: 2.1
>
>
> Repair has always built its merkle tree from all the data in a columnfamily,
> which is guaranteed to work but is inefficient.
> We can improve this by remembering which sstables have already been
> successfully repaired, and only repairing sstables new since the last repair.
> (This automatically makes CASSANDRA-3362 much less of a problem too.)
> The tricky part is, compaction will (if not taught otherwise) mix repaired
> data together with non-repaired. So we should segregate unrepaired sstables
> from the repaired ones.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira