[
https://issues.apache.org/jira/browse/CASSANDRA-8667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Philip Thompson updated CASSANDRA-8667:
---------------------------------------
Fix Version/s: 2.0.14
> ConcurrentMarkSweep loop
> -------------------------
>
> Key: CASSANDRA-8667
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8667
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Environment: dse 4.5.4 (cassandra 2.0.11.82), aws i2.x2large nodes
> Reporter: Gil Ganz
> Fix For: 2.0.14
>
> Attachments: cassandra-env.sh, cassandra.yaml
>
>
> hey
> we are having an issue with nodes that for some reason get into a full gc
> loop and never recover. can happen in any node from time to time, but
> recently we have a node (which was added to the cluster 2 days) ago that gets
> this every time.
> scenario is like this:
> almost no writes/reads going to cluster (<500 reads or writes per second),
> node is up for 10-20 minutes, doing compactions of big column families and
> then full gc starts to kick in, doing loops of 60sec cms gc, even if the heap
> is not full and the compaction becomes really slow, node starts to look down
> to other nodes.
> from system.log :
> INFO [ScheduledTasks:1] 2015-01-21 23:02:29,552 GCInspector.java (line 116)
> GC for ConcurrentMarkSweep: 36444 ms for 1 collections, 6933307656 used; max
> is 10317987840
> from gc.log.0:
> 2015-01-21T23:01:53.072-0800: 1541.643: [CMS2015-01-21T23:01:56.440-0800:
> 1545.011: [CMS-concurrent-mark: 13.914/13.951 secs] [Times: user=62.39
> sys=7.05, real=13.95 secs]
> (concurrent mode failure)CMS: Large block 0x0000000000000000
> : 6389749K->6389759K(6389760K), 36.1323980 secs]
> 10076149K->6685617K(10076160K), [CMS Perm : 28719K->28719K(47840K)]After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 0
> Max Chunk Size: 0
> Number of Blocks: 0
> Tree Height: 0
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 24576
> Max Chunk Size: 24576
> Number of Blocks: 1
> Av. Block Size: 24576
> Tree Height: 1
> , 36.1327700 secs] [Times: user=40.90 sys=0.00, real=36.14 secs]
> Heap after GC invocations=236 (full 19):
> par new generation total 3686400K, used 295857K [0x000000057ae00000,
> 0x0000000674e00000, 0x0000000674e00000)
> eden space 3276800K, 9% used [0x000000057ae00000, 0x000000058ceec4c0,
> 0x0000000642e00000)
> from space 409600K, 0% used [0x000000065be00000, 0x000000065be00000,
> 0x0000000674e00000)
> to space 409600K, 0% used [0x0000000642e00000, 0x0000000642e00000,
> 0x000000065be00000)
> concurrent mark-sweep generation total 6389760K, used 6389759K
> [0x0000000674e00000, 0x00000007fae00000, 0x00000007fae00000)
> concurrent-mark-sweep perm gen total 48032K, used 28719K
> [0x00000007fae00000, 0x00000007fdce8000, 0x0000000800000000)
> }
> 2015-01-21T23:02:29.204-0800: 1577.776: Total time for which application
> threads were stopped: 36.1334050 seconds
> 2015-01-21T23:02:29.239-0800: 1577.810: Total time for which application
> threads were stopped: 0.0060230 seconds
> 2015-01-21T23:02:29.239-0800: 1577.811: [GC [1 CMS-initial-mark:
> 6389759K(6389760K)] 6769792K(10076160K), 0.3112760 secs] [Times: user=0.00
> sys=0.00, real=0.31 secs]
> 2015-01-21T23:02:29.551-0800: 1578.122: Total time for which application
> threads were stopped: 0.3118580 seconds
> 2015-01-21T23:02:29.551-0800: 1578.122: [CMS-concurrent-mark-start]
> 2015-01-21T23:02:29.635-0800: 1578.206: Total time for which application
> threads were stopped: 0.0060250 seconds
> machines are i2.x2large (8 cores, 60gb ram), datadir is on ssd ephemeral,
> heap size 10g newgen 4gb (following dse recommendation to solve another issue
> with many parnew gc's going on)
> 2 dc cluster, 8 nodes in west, 17 nodes in the east (main dc), read heavy
> (15k writes per second, at least that much reads per second right now due to
> the problems but was high as 35k reads per second in the past).
> attached yaml and env file
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)