[
https://issues.apache.org/jira/browse/CASSANDRA-8723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14311864#comment-14311864
]
Brent Haines commented on CASSANDRA-8723:
-----------------------------------------
I ran
{ watch -n 10 'nodetool compactionstats' }
on the effected node and watch it for awhile. For us it would always end up on
the same compaction, of the same CF where it would get stuck until the OOM
happened. The stats on the compaction give you a hint -- the total number of
bytes are the same each time, then it will get some portion of the way through
the compaction when progress freezes and eventually the system runs OOM.
We have the standard replication factor of 3 so it was no big deal to stop
cassandra, delete the node's storage of that CF and then restart and run
repair. Care must be taken, obviously, but it did recover steady state for us
on 3 separate incidents. Once it's fixed no a node, we haven't had issues
return for that node.
> Cassandra 2.1.2 Memory issue - java process memory usage continuously
> increases until process is killed by OOM killer
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-8723
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8723
> Project: Cassandra
> Issue Type: Bug
> Reporter: Jeff Liu
> Fix For: 2.1.3
>
> Attachments: cassandra.yaml
>
>
> Issue:
> We have an on-going issue with cassandra nodes running with continuously
> increasing memory until killed by OOM.
> {noformat}
> Jan 29 10:15:41 cass-chisel19 kernel: [24533109.783481] Out of memory: Kill
> process 13919 (java) score 911 or sacrifice child
> Jan 29 10:15:41 cass-chisel19 kernel: [24533109.783557] Killed process 13919
> (java) total-vm:18366340kB, anon-rss:6461472kB, file-rss:6684kB
> {noformat}
> System Profile:
> cassandra version 2.1.2
> system: aws c1.xlarge instance with 8 cores, 7.1G memory.
> cassandra jvm:
> -Xms1792M -Xmx1792M -Xmn400M -Xss256k
> {noformat}
> java -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.8.jar
> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1792M -Xmx1792M
> -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
> -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1
> -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+UseTLAB -XX:+CMSClassUnloadingEnabled -XX:+UseCondCardMark
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
> -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime
> -XX:+PrintPromotionFailure -Xloggc:/var/log/cassandra/gc-1421511249.log
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=48M
> -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false
> -javaagent:/usr/share/java/graphite-reporter-agent-1.0-SNAPSHOT.jar=graphiteServer=metrics-a.hq.nest.com;graphitePort=2003;graphitePollInt=60
> -Dlogback.configurationFile=logback.xml
> -Dcassandra.logdir=/var/log/cassandra -Dcassandra.storagedir=
> -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid -cp
> /etc/cassandra:/usr/share/cassandra/lib/airline-0.6.jar:/usr/share/cassandra/lib/antlr-runtime-3.5.2.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/commons-math3-3.2.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.4.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-16.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.0.6.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.8.jar:/usr/share/cassandra/lib/javax.inject.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna-4.0.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/logback-classic-1.1.2.jar:/usr/share/cassandra/lib/logback-core-1.1.2.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/metrics-graphite-2.2.0.jar:/usr/share/cassandra/lib/mx4j-tools.jar:/usr/share/cassandra/lib/netty-all-4.0.23.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.2.jar:/usr/share/cassandra/lib/stream-2.5.2.jar:/usr/share/cassandra/lib/stringtemplate-4.0.2.jar:/usr/share/cassandra/lib/super-csv-2.1.0.jar:/usr/share/cassandra/lib/thrift-server-0.3.7.jar:/usr/share/cassandra/apache-cassandra-2.1.2.jar:/usr/share/cassandra/apache-cassandra-thrift-2.1.2.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/cassandra-driver-core-2.0.5.jar:/usr/share/cassandra/netty-3.9.0.Final.jar:/usr/share/cassandra/stress.jar:
> -XX:HeapDumpPath=/var/lib/cassandra/java_1421511248.hprof
> -XX:ErrorFile=/var/lib/cassandra/hs_err_1421511248.log
> org.apache.cassandra.service.CassandraDaemon
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)