[ 
https://issues.apache.org/jira/browse/CASSANDRA-8667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson updated CASSANDRA-8667:
---------------------------------------
    Fix Version/s: 2.0.14

> ConcurrentMarkSweep loop 
> -------------------------
>
>                 Key: CASSANDRA-8667
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8667
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: dse 4.5.4 (cassandra 2.0.11.82), aws i2.x2large nodes
>            Reporter: Gil Ganz
>             Fix For: 2.0.14
>
>         Attachments: cassandra-env.sh, cassandra.yaml
>
>
> hey
> we are having an issue with nodes that for some reason get into a full gc 
> loop and never recover. can happen in any node from time to time, but 
> recently we have a node (which was added to the cluster 2 days) ago that gets 
> this every time.
> scenario is like this:
> almost no writes/reads going to cluster (<500 reads or writes per second), 
> node is up for 10-20 minutes, doing compactions of big column families and 
> then full gc starts to kick in, doing loops of 60sec cms gc, even if the heap 
> is not full and the compaction becomes really slow, node starts to look  down 
> to other nodes.
> from system.log :
> INFO [ScheduledTasks:1] 2015-01-21 23:02:29,552 GCInspector.java (line 116) 
> GC for ConcurrentMarkSweep: 36444 ms for 1 collections, 6933307656 used; max 
> is 10317987840
> from gc.log.0:
> 2015-01-21T23:01:53.072-0800: 1541.643: [CMS2015-01-21T23:01:56.440-0800: 
> 1545.011: [CMS-concurrent-mark: 13.914/13.951 secs] [Times: user=62.39 
> sys=7.05, real=13.95 secs]
>  (concurrent mode failure)CMS: Large block 0x0000000000000000
> : 6389749K->6389759K(6389760K), 36.1323980 secs] 
> 10076149K->6685617K(10076160K), [CMS Perm : 28719K->28719K(47840K)]After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 0
> Max   Chunk Size: 0
> Number of Blocks: 0
> Tree      Height: 0
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 24576
> Max   Chunk Size: 24576
> Number of Blocks: 1
> Av.  Block  Size: 24576
> Tree      Height: 1
> , 36.1327700 secs] [Times: user=40.90 sys=0.00, real=36.14 secs]
> Heap after GC invocations=236 (full 19):
>  par new generation   total 3686400K, used 295857K [0x000000057ae00000, 
> 0x0000000674e00000, 0x0000000674e00000)
>   eden space 3276800K,   9% used [0x000000057ae00000, 0x000000058ceec4c0, 
> 0x0000000642e00000)
>   from space 409600K,   0% used [0x000000065be00000, 0x000000065be00000, 
> 0x0000000674e00000)
>   to   space 409600K,   0% used [0x0000000642e00000, 0x0000000642e00000, 
> 0x000000065be00000)
>  concurrent mark-sweep generation total 6389760K, used 6389759K 
> [0x0000000674e00000, 0x00000007fae00000, 0x00000007fae00000)
>  concurrent-mark-sweep perm gen total 48032K, used 28719K 
> [0x00000007fae00000, 0x00000007fdce8000, 0x0000000800000000)
> }
> 2015-01-21T23:02:29.204-0800: 1577.776: Total time for which application 
> threads were stopped: 36.1334050 seconds
> 2015-01-21T23:02:29.239-0800: 1577.810: Total time for which application 
> threads were stopped: 0.0060230 seconds
> 2015-01-21T23:02:29.239-0800: 1577.811: [GC [1 CMS-initial-mark: 
> 6389759K(6389760K)] 6769792K(10076160K), 0.3112760 secs] [Times: user=0.00 
> sys=0.00, real=0.31 secs]
> 2015-01-21T23:02:29.551-0800: 1578.122: Total time for which application 
> threads were stopped: 0.3118580 seconds
> 2015-01-21T23:02:29.551-0800: 1578.122: [CMS-concurrent-mark-start]
> 2015-01-21T23:02:29.635-0800: 1578.206: Total time for which application 
> threads were stopped: 0.0060250 seconds
> machines are i2.x2large (8 cores, 60gb ram), datadir is on ssd ephemeral, 
> heap size 10g newgen 4gb (following dse recommendation to solve another issue 
> with many parnew gc's going on)
> 2 dc cluster, 8 nodes in west, 17 nodes in the east (main dc), read heavy 
> (15k writes per second, at least that much reads per second right now due to 
> the problems but was high as 35k reads per second in the past).
> attached yaml and env file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to