[ 
https://issues.apache.org/jira/browse/CASSANDRA-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13579252#comment-13579252
 ] 

Ahmed Bashir commented on CASSANDRA-5263:
-----------------------------------------

Setup: 8 node cluster, with 7 nodes containing 1.5M rows for a total load of 
~360MB.  The 8th node was lacking 30,000 rows, a total of 60MB missing.

With the default depth (15), we streamed 275MB to node #8, and tree computation 
took ~3min
With a depth of 17, we streamed 151MB to node #8, and tree computation took 
12min.  However, we saved time during streaming and subsequent compactions
With a depth of 20 (extreme case), we streamed just 69MB to node #8, and tree 
computation took 21min.  

We then tested with a multithreaded version of createPendingFiles(), which 
brought the tree computation time down significantly and actually made repair 
run faster for depth 20 as compared to the default when you sum up tree 
computation times, streaming time, etc.


                
> Allow Merkle tree maximum depth to be configurable
> --------------------------------------------------
>
>                 Key: CASSANDRA-5263
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5263
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Config
>    Affects Versions: 1.1.9
>            Reporter: Ahmed Bashir
>
> Currently, the maximum depth allowed for Merkle trees is hardcoded as 15.  
> This value should be configurable, just like phi_convict_treshold and other 
> properties.
> Given a cluster with nodes responsible for a large number of row keys, Merkle 
> tree comparisons can result in a large amount of unnecessary row keys being 
> streamed.
> Empirical testing indicates that reasonable changes to this depth (18, 20, 
> etc) don't affect the Merkle tree generation and differencing timings all 
> that much, and they can significantly reduce the amount of data being 
> streamed during repair. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to