[ 
https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699687#comment-13699687
 ] 

Jeremiah Jordan edited comment on CASSANDRA-5351 at 7/4/13 1:35 AM:
--------------------------------------------------------------------

Anti compaction sounds like it could work.
Then you really do just need an "am I repaired flag", because during repair you 
anti-compact into "repaired" and "not repaired" data.
So something like:
1. Calculate merkle trees, anti compacting each sstable into "data being 
repaired" and "data not being repaired" tmp sstables during the process.  Set a 
flag in the "data being repaired" sstables to show them as repaired.
2. Perform merkle exchange/streaming, flag tmp sstables coming in from 
streaming as repaired.
3. When the repair is done, convert all tmp sstables into real ones, and delete 
originals

sstables involved in the repair would be marked "already compacting" so they 
won't participate in compaction during the repair.

Since you don't promote from tmp to real until the repair complete's 
successfully, if the node dies in the middle of the repair, all the tmp 
sstables will just be removed at startup.

Then only compact like sstables, so there will be two sets of sstables "fully 
repaired" and "not repaired at all".

This is going to use a lot of Disk IO for all the anti-compaction, but as long 
as you run repair a lot, since it is cheap after the first time, it shouldn't 
be too bad.  Probably want to let people pick their repair strategy to begin 
with, this is going to hurt, disk io and space wise, the first time you do it 
on a 1 TB per node already existing data set...
                
      was (Author: jjordan):
    Anti compaction sounds like it could work.
Then you really do just need an "am I repaired flag", because during repair you 
anti-compact into "repaired" and "not repaired" data.
So something like:
1. Calculate merkle trees, anti compacting each sstable into "data being 
repaired" and "data not being repaired" tmp sstables during the process.  Set a 
flag in the "data being repaired" sstables to show them as repaired.
2. Perform merkle exchange/streaming, flag tmp sstables coming in from 
streaming as repaired.
3. When the repair is done, convert all tmp sstables into real ones, and delete 
originals

sstables involved in the repair would be marked "already compacting" so they 
won't participate in compaction during the repair.

Since you don't promote from tmp to real until the repair complete's 
successfully, if the node dies in the middle of the repair, all the tmp 
sstables will just be removed at startup.

Then only compact like sstables, so there will be two sets of sstables "fully 
repaired" and "not repaired at all".
                  
> Avoid repairing already-repaired data by default
> ------------------------------------------------
>
>                 Key: CASSANDRA-5351
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5351
>             Project: Cassandra
>          Issue Type: Task
>          Components: Core
>            Reporter: Jonathan Ellis
>              Labels: repair
>             Fix For: 2.1
>
>
> Repair has always built its merkle tree from all the data in a columnfamily, 
> which is guaranteed to work but is inefficient.
> We can improve this by remembering which sstables have already been 
> successfully repaired, and only repairing sstables new since the last repair. 
>  (This automatically makes CASSANDRA-3362 much less of a problem too.)
> The tricky part is, compaction will (if not taught otherwise) mix repaired 
> data together with non-repaired.  So we should segregate unrepaired sstables 
> from the repaired ones.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to