[ 
https://issues.apache.org/jira/browse/CASSANDRA-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13890449#comment-13890449
 ] 

Minh Do commented on CASSANDRA-5263:
------------------------------------

If I understand correctly, are you saying that if N is the total number of rows 
in all SSTables on a node for a given token range, then depth = logN with log 
base 2?  This works if a node does not hold too many rows.  Can we safely 
assume that a node does not hold more than 2^24 rows (or 16.7M rows)? Because 
for this many rows, we need to build a Merkle tree with depth 24 and requires 
about 1.6G of heap.  Beyond this number, I would say we run into memory heap 
allocation issue.  I was thinking earlier that depth 20 is the maximum 
allowable depth and I worked my way down to compute lower depth tree.   


> Allow Merkle tree maximum depth to be configurable
> --------------------------------------------------
>
>                 Key: CASSANDRA-5263
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5263
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Config
>    Affects Versions: 1.1.9
>            Reporter: Ahmed Bashir
>            Assignee: Minh Do
>
> Currently, the maximum depth allowed for Merkle trees is hardcoded as 15.  
> This value should be configurable, just like phi_convict_treshold and other 
> properties.
> Given a cluster with nodes responsible for a large number of row keys, Merkle 
> tree comparisons can result in a large amount of unnecessary row keys being 
> streamed.
> Empirical testing indicates that reasonable changes to this depth (18, 20, 
> etc) don't affect the Merkle tree generation and differencing timings all 
> that much, and they can significantly reduce the amount of data being 
> streamed during repair. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to