[ https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699687#comment-13699687 ]
Jeremiah Jordan edited comment on CASSANDRA-5351 at 7/4/13 1:35 AM: -------------------------------------------------------------------- Anti compaction sounds like it could work. Then you really do just need an "am I repaired flag", because during repair you anti-compact into "repaired" and "not repaired" data. So something like: 1. Calculate merkle trees, anti compacting each sstable into "data being repaired" and "data not being repaired" tmp sstables during the process. Set a flag in the "data being repaired" sstables to show them as repaired. 2. Perform merkle exchange/streaming, flag tmp sstables coming in from streaming as repaired. 3. When the repair is done, convert all tmp sstables into real ones, and delete originals sstables involved in the repair would be marked "already compacting" so they won't participate in compaction during the repair. Since you don't promote from tmp to real until the repair complete's successfully, if the node dies in the middle of the repair, all the tmp sstables will just be removed at startup. Then only compact like sstables, so there will be two sets of sstables "fully repaired" and "not repaired at all". This is going to use a lot of Disk IO for all the anti-compaction, but as long as you run repair a lot, since it is cheap after the first time, it shouldn't be too bad. Probably want to let people pick their repair strategy to begin with, this is going to hurt, disk io and space wise, the first time you do it on a 1 TB per node already existing data set... was (Author: jjordan): Anti compaction sounds like it could work. Then you really do just need an "am I repaired flag", because during repair you anti-compact into "repaired" and "not repaired" data. So something like: 1. Calculate merkle trees, anti compacting each sstable into "data being repaired" and "data not being repaired" tmp sstables during the process. Set a flag in the "data being repaired" sstables to show them as repaired. 2. Perform merkle exchange/streaming, flag tmp sstables coming in from streaming as repaired. 3. When the repair is done, convert all tmp sstables into real ones, and delete originals sstables involved in the repair would be marked "already compacting" so they won't participate in compaction during the repair. Since you don't promote from tmp to real until the repair complete's successfully, if the node dies in the middle of the repair, all the tmp sstables will just be removed at startup. Then only compact like sstables, so there will be two sets of sstables "fully repaired" and "not repaired at all". > Avoid repairing already-repaired data by default > ------------------------------------------------ > > Key: CASSANDRA-5351 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5351 > Project: Cassandra > Issue Type: Task > Components: Core > Reporter: Jonathan Ellis > Labels: repair > Fix For: 2.1 > > > Repair has always built its merkle tree from all the data in a columnfamily, > which is guaranteed to work but is inefficient. > We can improve this by remembering which sstables have already been > successfully repaired, and only repairing sstables new since the last repair. > (This automatically makes CASSANDRA-3362 much less of a problem too.) > The tricky part is, compaction will (if not taught otherwise) mix repaired > data together with non-repaired. So we should segregate unrepaired sstables > from the repaired ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira