[
https://issues.apache.org/jira/browse/CASSANDRA-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869639#comment-13869639
]
Jonathan Ellis commented on CASSANDRA-5263:
-------------------------------------------
You can estimate rows (partitions) in a range with the index sample.
SSTR.estimatedKeysForRanges will do this for you. (Until we have minhash or
similar a la CASSANDRA-6474 you'll probably want to assume worst-case, i.e. no
overlap among the sstables.)
100MB isn't much in an 8GB heap. I don't think we need to worry about that.
Is the tree building cpu bound or i/o bound?
> Allow Merkle tree maximum depth to be configurable
> --------------------------------------------------
>
> Key: CASSANDRA-5263
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5263
> Project: Cassandra
> Issue Type: Improvement
> Components: Config
> Affects Versions: 1.1.9
> Reporter: Ahmed Bashir
> Assignee: Minh Do
>
> Currently, the maximum depth allowed for Merkle trees is hardcoded as 15.
> This value should be configurable, just like phi_convict_treshold and other
> properties.
> Given a cluster with nodes responsible for a large number of row keys, Merkle
> tree comparisons can result in a large amount of unnecessary row keys being
> streamed.
> Empirical testing indicates that reasonable changes to this depth (18, 20,
> etc) don't affect the Merkle tree generation and differencing timings all
> that much, and they can significantly reduce the amount of data being
> streamed during repair.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)