[
https://issues.apache.org/jira/browse/CASSANDRA-11390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15206074#comment-15206074
]
Marcus Olsson commented on CASSANDRA-11390:
-------------------------------------------
Sure! :) I've taken a glance at the patch and it looks good and should reduce
the merkle tree memory usage in most cases. However I'm worried that we still
might be able to get into a situation where we allocate too much memory for the
merkle trees since we have a max depth of {{20}} which was used as max when
there was only one merkle tree. Due to the fact that we have a variable number
of concurrently repairing ranges it would probably be ineffective to simply
lower the max depth since that would reduce the resolution of the merkle trees
when we are repairing a fewer number of ranges. Instead we could calculate the
max depth each time. The number of nodes in a merkle tree is roughly {{2^d}}
where {{d}} is the depth of the tree and before CASSANDRA-5220 the max number
of nodes were {{2^20 = 1048576}}. If we want {{2^20}} to be the max total
number of nodes then {{ranges * 2^d}} would have to be less than {{2^20}} which
we could calculate as:
{code}
2^20 >= ranges * 2^d
<=>
log2(2^20) >= log2(ranges * 2^d)
<=>
20 >= log2(ranges) + d
<=>
d <= 20 - log2(ranges)
{code}
In java:
{code}
int maxDepth = (int) Math.floor(20 - Math.log(validator.desc.ranges.size()) /
Math.log(2));
// And then calculate the depth using:
int depth = numPartitions > 0 ? (int)
Math.min(Math.floor(Math.log(numPartitions)), maxDepth) : 0;
{code}
---
Another thing that also could reduce the merkle tree sizes is if we are able to
estimate how much overlap there is between sstables for each range, since that
could be used to reduce the estimated number of partitions. The effectiveness
of this would probably depend on the compaction strategy and how accurately we
can calculate the range overlap though.
> Too big MerkleTrees allocated during repair
> -------------------------------------------
>
> Key: CASSANDRA-11390
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11390
> Project: Cassandra
> Issue Type: Bug
> Reporter: Marcus Eriksson
> Assignee: Marcus Eriksson
> Fix For: 3.0.x, 3.x
>
>
> Since CASSANDRA-5220 we create one merkle tree per range, but each of those
> trees is allocated to hold all the keys on the node, taking up too much memory
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)