I read through the recent thread "Cassandra mad GC", which seemed very
similar to my situation, but didn¹t really help.

Here is what I get from my logs when I grep for GCInspector. Note that this
is the middle of the night on a dev server, so there should have been almost
no load.

 INFO [ScheduledTasks:1] 2014-01-29 02:41:16,579 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 341 ms for 1 collections, 8001582816 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:41:29,135 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 350 ms for 1 collections, 8027555576 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:41:41,646 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 364 ms for 1 collections, 8075851136 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:41:54,223 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 375 ms for 1 collections, 8124762400 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:42:24,258 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 22995 ms for 2 collections, 7385470288 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:45:21,328 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 218 ms for 1 collections, 7582480104 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:45:33,418 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 222 ms for 1 collections, 7584743872 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:45:45,527 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 217 ms for 1 collections, 7588514264 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:45:57,594 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 223 ms for 1 collections, 7590223632 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:46:09,686 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 226 ms for 1 collections, 7592826720 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:46:21,867 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 229 ms for 1 collections, 7595464520 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:46:33,869 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 227 ms for 1 collections, 7597109672 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:46:45,962 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 230 ms for 1 collections, 7599909296 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:46:57,964 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 230 ms for 1 collections, 7601584048 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:47:10,018 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 229 ms for 1 collections, 7604217952 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:47:22,136 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 236 ms for 1 collections, 7605867784 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:47:34,277 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 239 ms for 1 collections, 7607521456 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:47:46,292 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 235 ms for 1 collections, 7610667376 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:47:58,537 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 261 ms for 1 collections, 7650345088 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:48:10,783 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 269 ms for 1 collections, 7653016592 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:48:23,786 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 298 ms for 1 collections, 7716831032 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:48:35,988 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 308 ms for 1 collections, 7745178616 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:48:48,434 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 319 ms for 1 collections, 7796207088 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:49:00,902 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 320 ms for 1 collections, 7821378680 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:49:13,344 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 338 ms for 1 collections, 7859905288 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:49:25,471 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 352 ms for 1 collections, 7911145688 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:49:38,473 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 359 ms for 1 collections, 7938204144 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:49:50,895 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 368 ms for 1 collections, 7988088408 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:50:03,345 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 379 ms for 1 collections, 8014757368 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:50:15,920 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 384 ms for 1 collections, 8044074584 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:50:28,484 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 397 ms for 1 collections, 8094146216 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:50:41,040 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 405 ms for 1 collections, 8118675056 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:50:53,628 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 408 ms for 1 collections, 8121201008 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:51:19,816 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 26041 ms for 1 collections, 7385748352 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:59:38,271 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 20970 ms for 1 collections, 7379533664 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 03:00:07,020 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 25835 ms for 1 collections, 7386103248 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 03:00:28,777 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 20278 ms for 1 collections, 7380769872 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 03:00:55,514 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 24909 ms for 1 collections, 7382846344 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 03:01:23,741 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 26223 ms for 1 collections, 7387539176 used; max
is 8126464000

There are various long GC¹s sprinkled throughout my logs. Sometimes it
recovers from them. Sometime it goes into a death spiral.

About a minute prior to the big meltdown at 02:51:19 there was a lot of
activity on the local CF. About every 10 milliseconds the following is
repeated:

 INFO [MutationStage:552] 2014-01-29 02:50:35,489 ColumnFamilyStore.java
(line 740) Enqueuing flush of Memtable-local@1059338449(73/3672
serialized/live bytes, 2 ops)
 INFO [FlushWriter:45] 2014-01-29 02:50:35,489 Memtable.java (line 333)
Writing Memtable-local@1059338449(73/3672 serialized/live bytes, 2 ops)
 INFO [FlushWriter:45] 2014-01-29 02:50:35,492 Memtable.java (line 373)
Completed flushing 
/var/lib/cassandra/data/system/local/system-local-jb-104855-Data.db (93
bytes) for commitlog position ReplayPosition(segmentId=1390945513179,
position=25932849)

It seems strange to be repeatedly flushing a table at that frequency. Other
than that, I don¹t see anything odd in the logs just before GC went off the
rails.

I have not done any tuning. All my JVM settings are default. For the record,
here¹s what I get from ps:

java -ea 
-javaagent:/usr/local/apache-cassandra-2.0.4/bin/../lib/jamm-0.2.5.jar
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms7830M -Xmx7830M
-Xmn800M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
-XX:StringTableSize=1000003 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
-XX:+UseTLAB -XX:+UseCondCardMark Š

I have a half-dozen keyspaces with a half-dozen tables per keyspace, and a
total of several gigabytes of data. Jut a one-node cluster. Nothing really
remarkable. Any tables with records that could get really wide have been
sharded. No record should be bigger than a few MB.

On an unrelated note, I had to kill my server because it wouldn¹t do
anything but GC. Now I cannot start it. It errors out with the following:

java.lang.IllegalStateException: Unfinished compactions reference missing
sstables. This should never happen since compactions are marked finished
before we start removing the old sstables.
        at 
org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftover
s(ColumnFamilyStore.java:495)
        at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:264)
        at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:4
61)
        at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:504)
java.lang.IllegalStateException: Unfinished compactions reference missing
sstables. This should never happen since compactions are marked finished
before we start removing the old sstables.
        at 
org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftover
s(ColumnFamilyStore.java:495)
        at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:264)
        at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:4
61)
        at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:504)
Exception encountered during startup: Unfinished compactions reference
missing sstables. This should never happen since compactions are marked
finished before we start removing the old sstables.

I have a single node in my cluster. Is there any way to recover, or is my
data toast? I can restore my data, so this isn¹t a primary issue for me. The
GC issue, on the other hand, is preventing me from moving to production.

Robert



Reply via email to