[ 
https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13710119#comment-13710119
 ] 

Simon Guindon commented on CASSANDRA-5351:
------------------------------------------

We are experiencing {{nodetool repair -pr}} taking greater than 24 hours on our 
nodes which ties up significant resources on the nodes during the times users 
are active. I'm not as familiar with the Cassandra internals as I'd like to be, 
but I was wondering if someone could shed light whether this work improves that 
situation. After reading the details it sounds like it will. 

Does this avoid re-constructing the merkle tree every repair even though large 
portions of data haven't changed?

Here's what our environment is in case it's important.
* 840 nodes
* 30 billion rows
* Replication factor of 2
* 75 million rows inserted each night
* 75 million rows deleted each night

                
> Avoid repairing already-repaired data by default
> ------------------------------------------------
>
>                 Key: CASSANDRA-5351
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5351
>             Project: Cassandra
>          Issue Type: Task
>          Components: Core
>            Reporter: Jonathan Ellis
>              Labels: repair
>             Fix For: 2.1
>
>
> Repair has always built its merkle tree from all the data in a columnfamily, 
> which is guaranteed to work but is inefficient.
> We can improve this by remembering which sstables have already been 
> successfully repaired, and only repairing sstables new since the last repair. 
>  (This automatically makes CASSANDRA-3362 much less of a problem too.)
> The tricky part is, compaction will (if not taught otherwise) mix repaired 
> data together with non-repaired.  So we should segregate unrepaired sstables 
> from the repaired ones.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to