[ 
https://issues.apache.org/jira/browse/CASSANDRA-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16879413#comment-16879413
 ] 

Jeff Jirsa edited comment on CASSANDRA-15202 at 7/5/19 4:45 PM:
----------------------------------------------------------------

Perf testing of this patch, using JMX toggling to enable/disable, resulted in 
the following GC graph:

 

!offheap-mts-gc.png!

 

This is repair of a 12 instance cluster with 100 tables running in a loop. 
Starting at 04/08@~1330, the old style repair was run. In the afternoon of 
04/09, the prop was changed to use the offheap merkle trees, and the result is 
pretty clear: parnew collections drop from ~1-3s to ~100-300ms, and olg gen 
collections nearly completely disappear. 


was (Author: jjirsa):
Perf testing of this patch, using JMX toggling to enable/disable, resulted in 
the following GC graph:

 

!offheap-mts-gc.png!

 

This is repair of a 12 instance cluster with 100 tables running in a loop. 
Starting at 04/08@~1330, the old style repair was run. In the afternoon of 
04/09, the prop was changed to use the offheap merkle trees, and the result is 
pretty clear: parnew collections drop from ~3s to ~300ms, and olg gen 
collections nearly completely disappear. 

> Deserialize merkle trees off-heap
> ---------------------------------
>
>                 Key: CASSANDRA-15202
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15202
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Consistency/Repair
>            Reporter: Jeff Jirsa
>            Assignee: Aleksey Yeschenko
>            Priority: Normal
>             Fix For: 4.0
>
>         Attachments: offheap-mts-gc.png
>
>
> CASSANDRA-14096 made the first step to address the heavy on-heap footprint of 
> merkle trees on repair coordinators - by reducing the time frame over which 
> they are referenced, and by more intelligently limiting depth of the trees 
> based on available heap size.
> That alone improves GC profile and prevents OOMs, but doesn’t address the 
> issue entirely. The coordinator still must hold all the trees on heap at once 
> until it’s done diffing them with each other, which has a negative effect, 
> and, by reducing depth, we lose precision and thus cause more overstreaming 
> than before.
> One way to improve the situation further is to build on CASSANDRA-14096 and 
> move the trees entirely off-heap. This is a trivial endeavor, given that we 
> are dealing with what should be full binary trees (though in practice aren’t 
> quite, yet). This JIRA makes the first step towards there - by moving just 
> deserialisation off-heap, leaving construction on the replicas on-heap still.
> Additionally, the proposed patch fixes the issue of replica coordinators 
> sending merkle trees to itself over loopback, costing us a ser/deser loop per 
> tree.
> Please note that there is more room for improvement here, and depending on 
> 4.0 timeline those improvements may or may not land in time. To name a few:
> - with some minor modifications to init(), we can make sure that no matter 
> the range, the tree is *always* perfectly full; this would allow us to get 
> rid of child pointers in inner nodes, as child node addresses will be 
> trivially calculatable given fixed size of nodes
> - the trees can be easily constructed off-heap so long as you run init() to 
> pre-size the tree to find out how large a buffer you need
> - on-wire format doesn’t need to stream inner nodes, only leaves, and, 
> really, only the hashes of the leaves



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to