[ https://issues.apache.org/jira/browse/CASSANDRA-13365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16704616#comment-16704616 ]
Sergey Kirillov commented on CASSANDRA-13365: --------------------------------------------- [~sickcate] do you have any materialized views in your database? > Nodes entering GC loop, does not recover > ---------------------------------------- > > Key: CASSANDRA-13365 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13365 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: 34-node cluster over 4 DCs > Linux CentOS 7.2 x86 > Mix of 64GB/128GB RAM / node > Mix of 32/40 hardware threads / node, Xeon ~2.4Ghz > High read volume, low write volume, occasional sstable bulk loading > Reporter: Mina Naguib > Priority: Major > > Over the last week we've been observing two related problems affecting our > Cassandra cluster > Problem 1: 1-few nodes per DC entering GC loop, not recovering > Checking the heap usage stats, there's a sudden jump of 1-3GB. Some nodes > recover, but some don't and log this: > {noformat} > 2017-03-21T11:23:02.957-0400: 54099.519: [Full GC (Allocation Failure) > 13G->11G(14G), 29.4127307 secs] > 2017-03-21T11:23:45.270-0400: 54141.833: [Full GC (Allocation Failure) > 13G->12G(14G), 28.1561881 secs] > 2017-03-21T11:24:20.307-0400: 54176.869: [Full GC (Allocation Failure) > 13G->13G(14G), 27.7019501 secs] > 2017-03-21T11:24:50.528-0400: 54207.090: [Full GC (Allocation Failure) > 13G->13G(14G), 27.1372267 secs] > 2017-03-21T11:25:19.190-0400: 54235.752: [Full GC (Allocation Failure) > 13G->13G(14G), 27.0703975 secs] > 2017-03-21T11:25:46.711-0400: 54263.273: [Full GC (Allocation Failure) > 13G->13G(14G), 27.3187768 secs] > 2017-03-21T11:26:15.419-0400: 54291.981: [Full GC (Allocation Failure) > 13G->13G(14G), 26.9493405 secs] > 2017-03-21T11:26:43.399-0400: 54319.961: [Full GC (Allocation Failure) > 13G->13G(14G), 27.5222085 secs] > 2017-03-21T11:27:11.383-0400: 54347.945: [Full GC (Allocation Failure) > 13G->13G(14G), 27.1769581 secs] > 2017-03-21T11:27:40.174-0400: 54376.737: [Full GC (Allocation Failure) > 13G->13G(14G), 27.4639031 secs] > 2017-03-21T11:28:08.946-0400: 54405.508: [Full GC (Allocation Failure) > 13G->13G(14G), 30.3480523 secs] > 2017-03-21T11:28:40.117-0400: 54436.680: [Full GC (Allocation Failure) > 13G->13G(14G), 27.8220513 secs] > 2017-03-21T11:29:08.459-0400: 54465.022: [Full GC (Allocation Failure) > 13G->13G(14G), 27.4691271 secs] > 2017-03-21T11:29:37.114-0400: 54493.676: [Full GC (Allocation Failure) > 13G->13G(14G), 27.0275733 secs] > 2017-03-21T11:30:04.635-0400: 54521.198: [Full GC (Allocation Failure) > 13G->13G(14G), 27.1902627 secs] > 2017-03-21T11:30:32.114-0400: 54548.676: [Full GC (Allocation Failure) > 13G->13G(14G), 27.8872850 secs] > 2017-03-21T11:31:01.430-0400: 54577.993: [Full GC (Allocation Failure) > 13G->13G(14G), 27.1609706 secs] > 2017-03-21T11:31:29.024-0400: 54605.587: [Full GC (Allocation Failure) > 13G->13G(14G), 27.3635138 secs] > 2017-03-21T11:31:57.303-0400: 54633.865: [Full GC (Allocation Failure) > 13G->13G(14G), 27.4143510 secs] > 2017-03-21T11:32:25.110-0400: 54661.672: [Full GC (Allocation Failure) > 13G->13G(14G), 27.8595986 secs] > 2017-03-21T11:32:53.922-0400: 54690.485: [Full GC (Allocation Failure) > 13G->13G(14G), 27.5242543 secs] > 2017-03-21T11:33:21.867-0400: 54718.429: [Full GC (Allocation Failure) > 13G->13G(14G), 30.8930130 secs] > 2017-03-21T11:33:53.712-0400: 54750.275: [Full GC (Allocation Failure) > 13G->13G(14G), 27.6523013 secs] > 2017-03-21T11:34:21.760-0400: 54778.322: [Full GC (Allocation Failure) > 13G->13G(14G), 27.3030198 secs] > 2017-03-21T11:34:50.073-0400: 54806.635: [Full GC (Allocation Failure) > 13G->13G(14G), 27.1594154 secs] > 2017-03-21T11:35:17.743-0400: 54834.306: [Full GC (Allocation Failure) > 13G->13G(14G), 27.3766949 secs] > 2017-03-21T11:35:45.797-0400: 54862.360: [Full GC (Allocation Failure) > 13G->13G(14G), 27.5756770 secs] > 2017-03-21T11:36:13.816-0400: 54890.378: [Full GC (Allocation Failure) > 13G->13G(14G), 27.5541813 secs] > 2017-03-21T11:36:41.926-0400: 54918.488: [Full GC (Allocation Failure) > 13G->13G(14G), 33.7510103 secs] > 2017-03-21T11:37:16.132-0400: 54952.695: [Full GC (Allocation Failure) > 13G->13G(14G), 27.4856611 secs] > 2017-03-21T11:37:44.454-0400: 54981.017: [Full GC (Allocation Failure) > 13G->13G(14G), 28.1269335 secs] > 2017-03-21T11:38:12.774-0400: 55009.337: [Full GC (Allocation Failure) > 13G->13G(14G), 27.7830448 secs] > 2017-03-21T11:38:40.840-0400: 55037.402: [Full GC (Allocation Failure) > 13G->13G(14G), 27.3527326 secs] > 2017-03-21T11:39:08.610-0400: 55065.173: [Full GC (Allocation Failure) > 13G->13G(14G), 27.5828941 secs] > 2017-03-21T11:39:36.833-0400: 55093.396: [Full GC (Allocation Failure) > 13G->13G(14G), 27.9303030 secs] > 2017-03-21T11:40:05.265-0400: 55121.828: [Full GC (Allocation Failure) > 13G->13G(14G), 36.9902867 secs] > 2017-03-21T11:40:42.400-0400: 55158.963: [Full GC (Allocation Failure) > 13G->13G(14G), 27.6835744 secs] > 2017-03-21T11:41:10.529-0400: 55187.091: [Full GC (Allocation Failure) > 13G->13G(14G), 27.1899555 secs] > 2017-03-21T11:41:38.018-0400: 55214.581: [Full GC (Allocation Failure) > 13G->13G(14G), 27.7309706 secs] > 2017-03-21T11:42:06.062-0400: 55242.624: [Full GC (Allocation Failure) > 13G->13G(14G), 29.2103234 secs] > 2017-03-21T11:42:35.579-0400: 55272.141: [Full GC (Allocation Failure) > 13G->13G(14G), 27.9902741 secs] > 2017-03-21T11:43:03.955-0400: 55300.518: [Full GC (Allocation Failure) > 13G->13G(14G), 28.0764545 secs] > 2017-03-21T11:43:32.186-0400: 55328.748: [Full GC (Allocation Failure) > 13G->13G(14G), 28.3913296 secs] > 2017-03-21T11:44:00.810-0400: 55357.373: [Full GC (Allocation Failure) > 13G->13G(14G), 27.0945131 secs] > 2017-03-21T11:44:28.343-0400: 55384.905: [Full GC (Allocation Failure) > 13G->13G(14G), 27.3357763 secs] > 2017-03-21T11:44:55.909-0400: 55412.471: [Full GC (Allocation Failure) > 13G->13G(14G), 27.9216626 secs] > 2017-03-21T11:45:24.070-0400: 55440.632: [Full GC (Allocation Failure) > 13G->13G(14G), 29.8189953 secs] > 2017-03-21T11:45:54.092-0400: 55470.654: [Full GC (Allocation Failure) > 13G->13G(14G), 27.9992085 secs] > 2017-03-21T11:46:22.219-0400: 55498.782: [Full GC (Allocation Failure) > 13G->13G(14G), 27.4622100 secs] > 2017-03-21T11:46:49.983-0400: 55526.545: [Full GC (Allocation Failure) > 13G->13G(14G), 27.9159143 secs] > 2017-03-21T11:47:18.029-0400: 55554.592: [Full GC (Allocation Failure) > 13G->13G(14G), 28.5207919 secs] > 2017-03-21T11:47:46.938-0400: 55583.500: [Full GC (Allocation Failure) > 13G->13G(14G), 27.0355291 secs] > 2017-03-21T11:48:14.235-0400: 55610.798: [Full GC (Allocation Failure) > 13G->13G(14G), 35.6284865 secs] > 2017-03-21T11:48:50.013-0400: 55646.575: [Full GC (Allocation Failure) > 13G->13G(14G), 27.3299061 secs] > 2017-03-21T11:49:17.592-0400: 55674.154: [Full GC (Allocation Failure) > 13G->13G(14G), 27.2544348 secs] > {noformat} > When a node enters the above state:, it becomes useless over 1-5 minutes: > * Client connections die > * New connections time out > * CPU usage drops - all threads do nothing except for 1 thread near 100% CPU > ("VM Thread" os_prio=0 tid=0x00007ffaf8209bf0 nid=0x2d53 runnable"), doing > the above GC in a loop > * JMX becomes unqueriable > * Regular kill signal is ignored (needs kill -9) > I've managed to extract a full heap dump of a node while in this state, but > it's 20GB and I have not been able to find a tool that can analyze it. > I've extracted a histogram of heap objects. From a healthy node: > {noformat} > num #instances #bytes class name > ---------------------------------------------- > 1: 12688007 1116544616 io.netty.buffer.PooledUnsafeDirectByteBuf > 2: 24938940 798046080 io.netty.util.Recycler$DefaultHandle > 3: 11457548 733283072 > io.netty.channel.ChannelOutboundBuffer$Entry > 4: 9582138 459942624 java.nio.HeapByteBuffer > 5: 16974 393432416 [I > 6: 2014719 311356992 [B > 7: 4982167 181324848 [Ljava.lang.Object; > 8: 8531 118782704 [Lio.netty.util.Recycler$DefaultHandle; > 9: 2827662 113106480 org.apache.cassandra.db.rows.BufferCell > 10: 2737816 87610112 org.apache.cassandra.db.rows.BTreeRow > 11: 2740110 70228240 [Ljava.nio.ByteBuffer; > 12: 2738081 65713944 org.apache.cassandra.db.BufferClustering > 13: 1763956 56446592 > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node > 14: 1763736 56439552 > com.googlecode.concurrentlinkedhashmap.ConcurrentHashMapV8$Node > 15: 2205825 52939800 > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue > 16: 1617026 51744832 > java.util.concurrent.ConcurrentHashMap$Node > 17: 1769934 42478416 org.apache.cassandra.cache.KeyCacheKey > {noformat} > From an unhealthy node: > {noformat} > num #instances #bytes class name > ---------------------------------------------- > 1: 103726026 4978849248 java.nio.HeapByteBuffer > 2: 95646048 2832233872 [B > 3: 28076465 1207916952 [Ljava.lang.Object; > 4: 12803448 1126703424 io.netty.buffer.PooledUnsafeDirectByteBuf > 5: 25054634 801748288 io.netty.util.Recycler$DefaultHandle > 6: 11457548 733283072 > io.netty.channel.ChannelOutboundBuffer$Entry > 7: 24725380 593409120 java.util.ArrayList > 8: 5808290 185865280 java.util.RegularEnumSet > 9: 2870889 160769784 > org.apache.cassandra.transport.messages.ResultMessage$Rows > 10: 2937336 140992128 io.netty.buffer.SlicedAbstractByteBuf > 11: 8854 118773984 [Lio.netty.util.Recycler$DefaultHandle; > 12: 2830805 113232200 org.apache.cassandra.db.rows.BufferCell > 13: 2937336 93994752 > org.apache.cassandra.transport.Frame$Header > 14: 2870928 91869696 > org.apache.cassandra.cql3.ResultSet$ResultMetadata > 15: 2728627 87316064 org.apache.cassandra.db.rows.BTreeRow > 16: 2947033 70728792 > java.util.concurrent.ConcurrentLinkedQueue$Node > 17: 2937336 70496064 org.apache.cassandra.transport.Frame > {noformat} > The increase is clear in these top 3 object types: > {noformat} > 1: 103726026 4978849248 java.nio.HeapByteBuffer > 2: 95646048 2832233872 [B > 3: 28076465 1207916952 [Ljava.lang.Object; > {noformat} > Other related observations: > Client socket states: Essentially the JVM has stopped working, and the > recv-queue is piling up kernel-side but not being consumed app-side. Worth > noting that the peers (clients) have given up and closed the connections on > their end, but they still appear ESTABlished on cassandra's side since it > hasn't drain the read buffer till EOF/error/complete the closure: > {noformat} > ESTAB 558612 0 172.30.72.133:9042 172.30.72.145:11418 > ESTAB 564172 0 172.30.72.133:9042 172.30.72.145:12420 > ESTAB 545284 0 172.30.72.133:9042 172.30.72.145:18212 > ESTAB 565468 0 172.30.72.133:9042 172.30.72.145:18346 > ESTAB 525884 0 172.30.72.133:9042 172.30.72.145:18352 > ESTAB 532324 0 172.30.72.133:9042 172.30.72.145:28896 > ESTAB 517592 0 172.30.72.133:9042 172.30.72.145:34504 > ESTAB 553392 0 172.30.72.133:9042 172.30.72.145:36926 > ESTAB 581952 0 172.30.72.133:9042 172.30.72.145:40684 > ESTAB 543668 0 172.30.72.133:9042 172.30.72.145:42282 > ESTAB 511860 0 172.30.72.133:9042 172.30.72.145:47350 > ESTAB 590380 0 172.30.72.133:9042 172.30.72.145:47566 > ESTAB 583120 0 172.30.72.133:9042 172.30.72.145:50386 > ESTAB 536772 0 172.30.72.133:9042 172.30.72.145:52656 > ESTAB 31 0 172.30.72.133:9042 172.30.72.145:52946 > ESTAB 532140 0 172.30.72.133:9042 172.30.72.145:53618 > ESTAB 14397 0 172.30.72.133:9042 172.30.72.150:10016 > ESTAB 6444 0 172.30.72.133:9042 172.30.72.150:12176 > ESTAB 7568 0 172.30.72.133:9042 172.30.72.150:12934 > ESTAB 139479 0 172.30.72.133:9042 172.30.72.150:13186 > ESTAB 6896 0 172.30.72.133:9042 172.30.72.150:17316 > ESTAB 7808 0 172.30.72.133:9042 172.30.72.150:17448 > ESTAB 8712 0 172.30.72.133:9042 172.30.72.150:17748 > ESTAB 6944 0 172.30.72.133:9042 172.30.72.150:18472 > ESTAB 136700 0 172.30.72.133:9042 172.30.72.150:19860 > ESTAB 16260 0 172.30.72.133:9042 172.30.72.150:20696 > ESTAB 8943 0 172.30.72.133:9042 172.30.72.150:21208 > ESTAB 132877 0 172.30.72.133:9042 172.30.72.150:21420 > ESTAB 6936 0 172.30.72.133:9042 172.30.72.150:25440 > ESTAB 92 0 172.30.72.133:9042 172.30.72.150:25778 > ESTAB 137312 0 172.30.72.133:9042 172.30.72.150:28258 > ESTAB 40 0 172.30.72.133:9042 172.30.72.150:28348 > ESTAB 6940 0 172.30.72.133:9042 172.30.72.150:29718 > ESTAB 6416 0 172.30.72.133:9042 172.30.72.150:29858 > ESTAB 6580 0 172.30.72.133:9042 172.30.72.150:30518 > ESTAB 4202 0 172.30.72.133:9042 172.30.72.150:30948 > ESTAB 6060 0 172.30.72.133:9042 172.30.72.150:35276 > ESTAB 0 0 172.30.72.133:9042 172.30.72.150:40114 > ESTAB 134303 0 172.30.72.133:9042 172.30.72.150:42212 > ESTAB 396 0 172.30.72.133:9042 172.30.72.150:44848 > ESTAB 136450 0 172.30.72.133:9042 172.30.72.150:47742 > ESTAB 8318 0 172.30.72.133:9042 172.30.72.150:51054 > ESTAB 11976 0 172.30.72.133:9042 172.30.72.150:54560 > ESTAB 136193 0 172.30.72.133:9042 172.30.72.150:56094 > ESTAB 138737 0 172.30.72.133:9042 172.30.72.150:60382 > ESTAB 7996 0 172.30.72.133:9042 172.30.72.150:60932 > ESTAB 137235 0 172.30.72.133:9042 172.30.72.150:61568 > ESTAB 8184 0 172.30.72.133:9042 172.30.72.150:63054 > ESTAB 985276 0 172.30.72.133:9042 172.30.72.175:23494 > ESTAB 601528 0 172.30.72.133:9042 172.30.72.175:24140 > ESTAB 557844 0 172.30.72.133:9042 172.30.72.175:42248 > ESTAB 469344 0 172.30.72.133:9042 172.30.72.175:59704 > ESTAB 507160 0 172.30.72.133:9042 172.30.72.185:11958 > ESTAB 449196 0 172.30.72.133:9042 172.30.72.185:15458 > ESTAB 487796 0 172.30.72.133:9042 172.30.72.185:16458 > ESTAB 441132 0 172.30.72.133:9042 172.30.72.185:23194 > ESTAB 486664 0 172.30.72.133:9042 172.30.72.185:25276 > ESTAB 554780 0 172.30.72.133:9042 172.30.72.185:29102 > ESTAB 496248 0 172.30.72.133:9042 172.30.72.185:29428 > ESTAB 420048 0 172.30.72.133:9042 172.30.72.185:39650 > ESTAB 498704 0 172.30.72.133:9042 172.30.72.185:46424 > ESTAB 506488 0 172.30.72.133:9042 172.30.72.185:49722 > ESTAB 509416 0 172.30.72.133:9042 172.30.72.185:55838 > ESTAB 511956 0 172.30.72.133:9042 172.30.72.185:57670 > ESTAB 496720 0 172.30.72.133:9042 172.30.72.185:59884 > {noformat} > perf top during this state: > {noformat} > 19.73% libjvm.so [.] > InstanceKlass::oop_follow_contents > 14.05% libjvm.so [.] > ObjArrayKlass::oop_follow_contents > 7.83% libjvm.so [.] > MarkSweep::follow_stack > 6.38% libjvm.so [.] > HeapRegion::block_size > 5.41% libjvm.so [.] > OtherRegionsTable::add_reference > 5.12% libjvm.so [.] > UpdateRSOopClosure::do_oop > 4.77% libjvm.so [.] > HeapRegion::block_is_obj > {noformat} > Frequency: Affecting nodes in our 2 busier DCs (other 2 DCs serve less > volume). ~ 2-10 times per day. Not always same nodes (though lately 2 > hiccup more frequently than others). > Tried: > * Upgrading cassandra from 3.9 to 3.10 > * Increasing heap size from 8GB to 14GB > * Upgrading JVM 1.8 from u112 to u121 > * Switching from CMS to G1GC > The last change, switching from CMS to G1GC reduced the frequency of the > problem but has not eliminated it. > Problem 2: Read volume in DC depressed > With our RF set to 2 per DC, if 2 or more nodes in a DC enter the above GC > loop described in problem 1, all nodes in the same DC accumulate very high > active/pending/blocked Task-Transport queues. > Total read volume for the DC drops to 10%-30% of normal. > Tried: > * Increasing -Dcassandra.max_queued_native_transport_requests to 4096 > * Reducing read_request_timeout_in_ms to 400 > * Reducing write_request_timeout_in_ms to 800 > Improved things slightly but not too much. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org