[ 
https://issues.apache.org/jira/browse/CASSANDRA-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-15202:
------------------------------------------
    Source Control Link: 
[2117e2af00603f5fb2181e53dbcba190b2eab861|https://github.com/apache/cassandra/commit/2117e2af00603f5fb2181e53dbcba190b2eab861]
                 Status: Resolved  (was: Ready to Commit)
             Resolution: Fixed

Cheers, committed to trunk as 
[2117e2af00603f5fb2181e53dbcba190b2eab861|https://github.com/apache/cassandra/commit/2117e2af00603f5fb2181e53dbcba190b2eab861]

On commit fixed {{find()}} javadoc and extended `difference()` tests to cover 
serde loops of a previously moved off-heap tree. It's not a code path we ever 
use in trunk, but it is something we do in 3.0, when dumping trees for 
debugging.

> Deserialize merkle trees off-heap
> ---------------------------------
>
>                 Key: CASSANDRA-15202
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15202
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Consistency/Repair
>            Reporter: Jeff Jirsa
>            Assignee: Aleksey Yeschenko
>            Priority: Normal
>             Fix For: 4.0
>
>         Attachments: offheap-mts-gc.png
>
>
> CASSANDRA-14096 made the first step to address the heavy on-heap footprint of 
> merkle trees on repair coordinators - by reducing the time frame over which 
> they are referenced, and by more intelligently limiting depth of the trees 
> based on available heap size.
> That alone improves GC profile and prevents OOMs, but doesn’t address the 
> issue entirely. The coordinator still must hold all the trees on heap at once 
> until it’s done diffing them with each other, which has a negative effect, 
> and, by reducing depth, we lose precision and thus cause more overstreaming 
> than before.
> One way to improve the situation further is to build on CASSANDRA-14096 and 
> move the trees entirely off-heap. This is a trivial endeavor, given that we 
> are dealing with what should be full binary trees (though in practice aren’t 
> quite, yet). This JIRA makes the first step towards there - by moving just 
> deserialisation off-heap, leaving construction on the replicas on-heap still.
> Additionally, the proposed patch fixes the issue of replica coordinators 
> sending merkle trees to itself over loopback, costing us a ser/deser loop per 
> tree.
> Please note that there is more room for improvement here, and depending on 
> 4.0 timeline those improvements may or may not land in time. To name a few:
> - with some minor modifications to init(), we can make sure that no matter 
> the range, the tree is *always* perfectly full; this would allow us to get 
> rid of child pointers in inner nodes, as child node addresses will be 
> trivially calculatable given fixed size of nodes
> - the trees can be easily constructed off-heap so long as you run init() to 
> pre-size the tree to find out how large a buffer you need
> - on-wire format doesn’t need to stream inner nodes, only leaves, and, 
> really, only the hashes of the leaves



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to