[
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573404#comment-14573404
]
Philip Thompson commented on CASSANDRA-9549:
--------------------------------------------
Original description says it's happening for every node in the cluster, and
that they've all been restarted.
> Memory leak
> ------------
>
> Key: CASSANDRA-9549
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes,
> 2 cores 7.5G memory, 800G platter for cassandra data, root partition and
> commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1
> replica/zone
> JVM: /usr/java/jdk1.8.0_40/jre/bin/java
> JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar
> -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities
> -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M
> -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
> -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1
> -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler
> -XX:CMSWaitDuration=10000 -XX:+CMSParallelInitialMarkEnabled
> -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=10000 -XX:+UseCondCardMark
> -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199
> -Dcom.sun.management.jmxremote.rmi.port=7199
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false
> -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra
> -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid
> Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
> Reporter: Ivar Thorson
> Priority: Critical
> Fix For: 2.1.x
>
> Attachments: cassandra.yaml, cpu-load.png, memoryuse.png,
> suspect.png, two-loads.png
>
>
> We have been experiencing a severe memory leak with Cassandra 2.1.5 that,
> over the period of a couple of days, eventually consumes all of the available
> JVM heap space, putting the JVM into GC hell where it keeps trying CMS
> collection but can't free up any heap space. This pattern happens for every
> node in our cluster and is requiring rolling cassandra restarts just to keep
> the cluster running. We have upgraded the cluster per Datastax docs from the
> 2.0 branch a couple of months ago and have been using the data from this
> cluster for more than a year without problem.
> As the heap fills up with non-GC-able objects, the CPU/OS load average grows
> along with it. Heap dumps reveal an increasing number of
> java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps
> over a 2 day period, and watched the number of Node objects go from 4M, to
> 19M, to 36M, and eventually about 65M objects before the node stops
> responding. The screen capture of our heap dump is from the 19M measurement.
> Load on the cluster is minimal. We can see this effect even with only a
> handful of writes per second. (See attachments for Opscenter snapshots during
> very light loads and heavier loads). Even with only 5 reads a sec we see this
> behavior.
> Log files show repeated errors in Ref.java:181 and Ref.java:279 and "LEAK
> detected" messages:
> {code}
> ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error
> when closing class
> org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
> java.util.concurrent.RejectedExecutionException: Task
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31
> rejected from
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
> {code}
> {code}
> ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK
> DETECTED: a reference
> (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) to class
> org.apache.cassandra.io.sstable.SSTableReader$DescriptorTypeTidy@2054303604:/data2/data/ourtablegoeshere-ka-1151
> was not released before the reference was garbage collected
> {code}
> This might be related to [CASSANDRA-8723]?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)