[
https://issues.apache.org/jira/browse/CASSANDRA-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992394#comment-16992394
]
Chris Kistner edited comment on CASSANDRA-14355 at 12/10/19 10:17 AM:
----------------------------------------------------------------------
We have now experienced an issue that might be related to this, however our
Cassandra did not crash yet - it just had frequent (every ~ 2 minutes)
"ConcurrentMarkSweep GC" events of 16+ seconds!
eg.:
{noformat}
WARN [Service Thread] 2019-12-10 08:03:19,969 GCInspector.java:282 -
ConcurrentMarkSweep GC in 19129ms. CMS Old Gen: 7547650016 -> 7547650048; Par
Eden Space: 671088640 -> 251798544; Par Survivor Space: 83886048 -> 0
WARN [Service Thread] 2019-12-10 08:03:37,565 GCInspector.java:282 -
ConcurrentMarkSweep GC in 16379ms. Par Eden Space: 671088640 -> 254509608; Par
Survivor Space: 83886032 -> 0{noformat}
Sometimes it went back down to 200ms again, and after we did a "nodetool drain"
and then removed the node from the cluster the GC time remained sub 250ms.
Our setup is:
* 5 nodes in dc1, 5 nodes in dc2.
* RF: dc1=5, dc2=5
* CL = Local Quorum
* Host with 32GB of RAM -> Cassandra allocates 8GB to heap
* Java version: java-1.8.0-openjdk-1.8.0.151-5.b12
* Using Cassandra Reaper 4.6.1 where we scheduled a repair with 32
segments/node (364 segments in total)
I have attached some screenshots from our~11GB heap dump, where
io.netty.util.concurrent.FastThreadLocalThread contributed towards 6.4GB of the
heap size:
* Problem Suspect 1: LongGC_Problem-Suspect-1_FastThreadLocalThread.png
* Dominator Tree: LongGC_Dominator-Tree.png
* Histogram: LongGC_Histogram.png
I have also attached the output of "nodetool status": LongGC_nodetool_info.txt
We have not tried out Cassandra 3.11.5, which apparently solved the Repair OOME
issue: CASSANDRA-14096
was (Author: padakwaak):
We have now experienced an issue that might be related to this, however our
Cassandra did not crash yet - it just had frequent (every ~ 2 minutes)
"ConcurrentMarkSweep GC" events of 16+ seconds!
eg.:
{noformat}
Line 78776: WARN [Service Thread] 2019-12-10 08:03:19,969 GCInspector.java:282
- ConcurrentMarkSweep GC in 19129ms. CMS Old Gen: 7547650016 -> 7547650048;
Par Eden Space: 671088640 -> 251798544; Par Survivor Space: 83886048 -> 0
Line 79080: WARN [Service Thread] 2019-12-10 08:03:37,565
GCInspector.java:282 - ConcurrentMarkSweep GC in 16379ms. Par Eden Space:
671088640 -> 254509608; Par Survivor Space: 83886032 -> 0{noformat}
Sometimes it went back down to 200ms again, and after we did a "nodetool drain"
and then removed the node from the cluster the GC time remained sub 250ms.
Our setup is:
* 5 nodes in dc1, 5 nodes in dc2.
* RF: dc1=5, dc2=5
* CL = Local Quorum
* Host with 32GB of RAM -> Cassandra allocates 8GB to heap
* Java version: java-1.8.0-openjdk-1.8.0.151-5.b12
* Using Cassandra Reaper 4.6.1 where we scheduled a repair with 32
segments/node (364 segments in total)
I have attached some screenshots from our~11GB heap dump, where
io.netty.util.concurrent.FastThreadLocalThread contributed towards 6.4GB of the
heap size:
* Problem Suspect 1: LongGC_Problem-Suspect-1_FastThreadLocalThread.png
* Dominator Tree: LongGC_Dominator-Tree.png
* Histogram: LongGC_Histogram.png
I have also attached the output of "nodetool status": LongGC_nodetool_info.txt
We have not tried out Cassandra 3.11.5, which apparently solved the Repair OOME
issue: CASSANDRA-14096
> Memory leak
> -----------
>
> Key: CASSANDRA-14355
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14355
> Project: Cassandra
> Issue Type: Bug
> Components: Legacy/Core
> Environment: Debian Jessie, OpenJDK 1.8.0_151
> Reporter: Eric Evans
> Priority: Normal
> Fix For: 3.11.x
>
> Attachments: 01_Screenshot from 2018-04-04 14-24-00.png,
> 02_Screenshot from 2018-04-04 14-28-33.png, 03_Screenshot from 2018-04-04
> 14-24-50.png, LongGC_Dominator-Tree.png, LongGC_Histogram.png,
> LongGC_Problem-Suspect-1_FastThreadLocalThread.png, LongGC_nodetool_info.txt
>
>
> We're seeing regular, frequent {{OutOfMemoryError}} exceptions. Similar to
> CASSANDRA-13754, an analysis of the heap dumps shows the heap consumed by the
> {{threadLocals}} member of the instances of
> {{io.netty.util.concurrent.FastThreadLocalThread}}.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]