[jira] [Comment Edited] (CASSANDRA-14355) Memory leak

Chris Kistner (Jira) Tue, 10 Dec 2019 02:18:09 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992394#comment-16992394
 ]


Chris Kistner edited comment on CASSANDRA-14355 at 12/10/19 10:17 AM:
----------------------------------------------------------------------

We have now experienced an issue that might be related to this, however our 
Cassandra did not crash yet - it just had frequent (every ~ 2 minutes) 
"ConcurrentMarkSweep GC" events of 16+ seconds!
eg.:
{noformat}
WARN  [Service Thread] 2019-12-10 08:03:19,969 GCInspector.java:282 - 
ConcurrentMarkSweep GC in 19129ms.  CMS Old Gen: 7547650016 -> 7547650048; Par 
Eden Space: 671088640 -> 251798544; Par Survivor Space: 83886048 -> 0
WARN  [Service Thread] 2019-12-10 08:03:37,565 GCInspector.java:282 - 
ConcurrentMarkSweep GC in 16379ms.  Par Eden Space: 671088640 -> 254509608; Par 
Survivor Space: 83886032 -> 0{noformat}
Sometimes it went back down to 200ms again, and after we did a "nodetool drain" 
and then removed the node from the cluster the GC time remained sub 250ms.

Our setup is:
 * 5 nodes in dc1, 5 nodes in dc2.
 * RF: dc1=5, dc2=5
 * CL = Local Quorum
 * Host with 32GB of RAM -> Cassandra allocates 8GB to heap
 * Java version: java-1.8.0-openjdk-1.8.0.151-5.b12
 * Using Cassandra Reaper 4.6.1 where we scheduled a repair with 32 
segments/node (364 segments in total)

I have attached some screenshots from our~11GB heap dump, where 
io.netty.util.concurrent.FastThreadLocalThread contributed towards 6.4GB of the 
heap size:
* Problem Suspect 1: LongGC_Problem-Suspect-1_FastThreadLocalThread.png
* Dominator Tree: LongGC_Dominator-Tree.png
* Histogram: LongGC_Histogram.png
I have also attached the output of "nodetool status": LongGC_nodetool_info.txt

We have not tried out Cassandra 3.11.5, which apparently solved the Repair OOME 
issue: CASSANDRA-14096


was (Author: padakwaak):
We have now experienced an issue that might be related to this, however our 
Cassandra did not crash yet - it just had frequent (every ~ 2 minutes) 
"ConcurrentMarkSweep GC" events of 16+ seconds!
eg.:
{noformat}
Line 78776: WARN  [Service Thread] 2019-12-10 08:03:19,969 GCInspector.java:282 
- ConcurrentMarkSweep GC in 19129ms.  CMS Old Gen: 7547650016 -> 7547650048; 
Par Eden Space: 671088640 -> 251798544; Par Survivor Space: 83886048 -> 0
        Line 79080: WARN  [Service Thread] 2019-12-10 08:03:37,565 
GCInspector.java:282 - ConcurrentMarkSweep GC in 16379ms.  Par Eden Space: 
671088640 -> 254509608; Par Survivor Space: 83886032 -> 0{noformat}
Sometimes it went back down to 200ms again, and after we did a "nodetool drain" 
and then removed the node from the cluster the GC time remained sub 250ms.

Our setup is:
 * 5 nodes in dc1, 5 nodes in dc2.
 * RF: dc1=5, dc2=5
 * CL = Local Quorum
 * Host with 32GB of RAM -> Cassandra allocates 8GB to heap
 * Java version: java-1.8.0-openjdk-1.8.0.151-5.b12
 * Using Cassandra Reaper 4.6.1 where we scheduled a repair with 32 
segments/node (364 segments in total)

I have attached some screenshots from our~11GB heap dump, where 
io.netty.util.concurrent.FastThreadLocalThread contributed towards 6.4GB of the 
heap size:
* Problem Suspect 1: LongGC_Problem-Suspect-1_FastThreadLocalThread.png
* Dominator Tree: LongGC_Dominator-Tree.png
* Histogram: LongGC_Histogram.png
I have also attached the output of "nodetool status": LongGC_nodetool_info.txt

We have not tried out Cassandra 3.11.5, which apparently solved the Repair OOME 
issue: CASSANDRA-14096

> Memory leak
> -----------
>
>                 Key: CASSANDRA-14355
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14355
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Legacy/Core
>         Environment: Debian Jessie, OpenJDK 1.8.0_151
>            Reporter: Eric Evans
>            Priority: Normal
>             Fix For: 3.11.x
>
>         Attachments: 01_Screenshot from 2018-04-04 14-24-00.png, 
> 02_Screenshot from 2018-04-04 14-28-33.png, 03_Screenshot from 2018-04-04 
> 14-24-50.png, LongGC_Dominator-Tree.png, LongGC_Histogram.png, 
> LongGC_Problem-Suspect-1_FastThreadLocalThread.png, LongGC_nodetool_info.txt
>
>
> We're seeing regular, frequent {{OutOfMemoryError}} exceptions.  Similar to 
> CASSANDRA-13754, an analysis of the heap dumps shows the heap consumed by the 
> {{threadLocals}} member of the instances of 
> {{io.netty.util.concurrent.FastThreadLocalThread}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-14355) Memory leak

Reply via email to