[jira] [Commented] (CASSANDRA-5351) Avoid repairing already-repaired data by default

Jonathan Ellis (JIRA) Wed, 03 Jul 2013 17:57:17 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699653#comment-13699653
 ]


Jonathan Ellis commented on CASSANDRA-5351:
-------------------------------------------

bq. What about sstables that share ranges but have different repaired states? 
They can't be combined since that would screw up the repaired state one way or 
the other.

Yes, I think this is a catch-22 scenario -- if we do combine them, then we 
throw away the repair state we went to so much work to generate; if we don't, 
then we kill read performance.

I think an "anticompation" approach is more promising -- write out separate 
sstables that are all repaired, and another set that is all unrepaired.  Then 
we can allow combining within both sets.  So basically, we spend some i/o doing 
the anticompaction, to reduce the i/o required by all future repairs.  Seems 
reasonable on paper.

As for what to do with compaction, either locking them during the repair or 
skipping anyone who happened to get compacted concurrently should be workable, 
but I'd lean towards the latter.
                
> Avoid repairing already-repaired data by default
> ------------------------------------------------
>
>                 Key: CASSANDRA-5351
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5351
>             Project: Cassandra
>          Issue Type: Task
>          Components: Core
>            Reporter: Jonathan Ellis
>              Labels: repair
>             Fix For: 2.1
>
>
> Repair has always built its merkle tree from all the data in a columnfamily, 
> which is guaranteed to work but is inefficient.
> We can improve this by remembering which sstables have already been 
> successfully repaired, and only repairing sstables new since the last repair. 
>  (This automatically makes CASSANDRA-3362 much less of a problem too.)
> The tricky part is, compaction will (if not taught otherwise) mix repaired 
> data together with non-repaired.  So we should segregate unrepaired sstables 
> from the repaired ones.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-5351) Avoid repairing already-repaired data by default

Reply via email to