[ https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17793935#comment-17793935 ]
Brad Schoening edited comment on CASSANDRA-18762 at 12/6/23 9:13 PM: --------------------------------------------------------------------- An update: we are still seeing this occur on a cluster. They have configured native_transport_max_thread = 256. A large number of repair Merkle trees precedes the OOM crash. System: Apache Cassandra 4.0.10, a 16GB heap, 64GB RAM, 8 vCPUs file_cache_size_in_mb =4096, G1HeapRegionSize=16M !image-2023-12-06-15-58-55-007.png! above graph is missing time ticks, but the spike occurs at 06:16:00 !image-2023-12-06-15-29-31-491.png! Summary of the cassandra log: {{{{11:17:10,289 [INFO ] RepairSession.java:202 - [repair #838c24c0-935f-11ee-97ba-d79b6a12ccbe|#838c24c0-935f-11ee-97ba-d79b6a12ccbe] Received merkle tree for table1 from ... [repeated 35 times]}}}} {{{{11:17:17,155 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 694ms. G1 Eden Space: 8925478912 -> 0; G1 Old Gen: 2196360784 -> 1133473904; G1 Survivor Space: 385875968 -> 0; }}}} {{{{11:17:17,668 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 505ms. G1 Old Gen: 1133473904 -> 1133526408; }}}} {{{{11:17:22,420 [INFO ] [ScheduledTasks:1] cluster_id=99 ip_address=10.0.0.1 NoSpamLogger.java:92 - Some operations were slow, details available at debug level (debug.log)}}}} {{{{11:17:22,417 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 787ms. G1 Eden Space: 16777216 -> 0; G1 Old Gen: 1133526408 -> 1133545448; }}}} {{{{11:17:23,213 [WARN ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:292 - G1 Old Generation GC in 4742ms. G1 Old Gen: 1133545448 -> 1133581144; }}}} {{{{11:17:23,217 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 StatusLogger.java:65 .... [abbr] }}}}{{{{StatusLogger.java:65 - Pool Name Active Pending Completed Blocked All Time Blocked}}}} {{{{StatusLogger.java:69 - ReadStage 1 0 48261572 0 0}}}} {{{{StatusLogger.java:69 - Native-Transport-Requests 1 0 395189663 0 0}}}} {{{{StatusLogger.java:69 - ValidationExecutor 4 73 110086 0 0}}}} {{{{StatusLogger.java:69 - AntiEntropyStage 1 0 352704 0 0}}}} {{{{11:17:24,114 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 853ms. G1 Eden Space: 117440512 -> 0; G1 Old Gen: 1133747360 -> 1133758448; }}}} {{{{11:17:24,564 [ERROR] [Messaging-EventLoop-3-5] cluster_id=99 ip_address=10.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error letting the JVM handle the error:}}}} {{{{java.lang.OutOfMemoryError: Direct buffer memory}}}} {{{{ at java.base/java.nio.Bits.reserveMemory(Bits.java:175)}}}} {{{{ at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:118)}}}} {{{{ at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318)}}}}{{ ... etc}} was (Author: bschoeni): An update: we are still seeing this occur on a cluster. They have configured native_transport_max_thread = 256. A large number of repair Merkle trees precedes the OOM crash. System: Apache Cassandra 4.0.10, a 16GB heap, 64GB RAM, 8 vCPUs file_cache_size_in_mb =4096, G1HeapRegionSize=16M !image-2023-12-06-15-58-55-007.png! above graph is missing time ticks, but the spike occurs at 06:16:00 !image-2023-12-06-15-29-31-491.png! Summary of the cassandra log: {{11:17:10,289 [INFO ] RepairSession.java:202 - [repair #838c24c0-935f-11ee-97ba-d79b6a12ccbe|#838c24c0-935f-11ee-97ba-d79b6a12ccbe] Received merkle tree for table1 from ... [repeated 35 times]}} {{11:17:17,155 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 694ms. G1 Eden Space: 8925478912 -> 0; G1 Old Gen: 2196360784 -> 1133473904; G1 Survivor Space: 385875968 -> 0; }} {{11:17:17,668 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 505ms. G1 Old Gen: 1133473904 -> 1133526408; }} {{11:17:22,420 [INFO ] [ScheduledTasks:1] cluster_id=99 ip_address=10.0.0.1 NoSpamLogger.java:92 - Some operations were slow, details available at debug level (debug.log)}} {{11:17:22,417 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 787ms. G1 Eden Space: 16777216 -> 0; G1 Old Gen: 1133526408 -> 1133545448; }} {{11:17:23,213 [WARN ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:292 - G1 Old Generation GC in 4742ms. G1 Old Gen: 1133545448 -> 1133581144; }} {{11:17:23,217 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 StatusLogger.java:65 .... [abbr] }}{{StatusLogger.java:65 - Pool Name Active Pending Completed Blocked All Time Blocked}} {{StatusLogger.java:69 - ReadStage 1 0 48261572 0 0}} {{StatusLogger.java:69 - Native-Transport-Requests 1 0 395189663 0 0}} {{StatusLogger.java:69 - ValidationExecutor 4 73 110086 0 0}} {{StatusLogger.java:69 - AntiEntropyStage 1 0 352704 0 0}} {{11:17:24,114 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 853ms. G1 Eden Space: 117440512 -> 0; G1 Old Gen: 1133747360 -> 1133758448; }} {{11:17:24,564 [ERROR] [Messaging-EventLoop-3-5] cluster_id=99 ip_address=10.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error letting the JVM handle the error:}} {{java.lang.OutOfMemoryError: Direct buffer memory}} {{ at java.base/java.nio.Bits.reserveMemory(Bits.java:175)}} {{ at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:118)}} {{ at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318)}} ... etc > Repair triggers OOM with direct buffer memory > --------------------------------------------- > > Key: CASSANDRA-18762 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18762 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair > Reporter: Brad Schoening > Priority: Normal > Labels: OutOfMemoryError > Attachments: Cluster-dm-metrics-1.PNG, > image-2023-12-06-15-28-05-459.png, image-2023-12-06-15-29-31-491.png, > image-2023-12-06-15-58-55-007.png > > > We are seeing repeated failures of nodes with 16GB of heap and the same size > (16GB) for direct memory (derived from -Xms). This seems to be related to > CASSANDRA-15202 which moved merkel trees off-heap in 4.0. Using Cassandra > 4.0.6. > {noformat} > 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from > /169.102.200.241:7000 > 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.93.192.29:7000 > 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from > /169.104.171.134:7000 > 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.79.232.67:7000 > 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 > ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. > Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; > G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; > Metaspace: 80411136 -> 80176528 > 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error > letting the JVM handle the error: > java.lang.OutOfMemoryError: Direct buffer memory > at java.base/java.nio.Bits.reserveMemory(Bits.java:175) > at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:118) > at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318) > at org.apache.cassandra.utils.MerkleTree.allocate(MerkleTree.java:742) > at > org.apache.cassandra.utils.MerkleTree.deserializeOffHeap(MerkleTree.java:780) > at org.apache.cassandra.utils.MerkleTree.deserializeTree(MerkleTree.java:751) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:720) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:698) > at > org.apache.cassandra.utils.MerkleTrees$MerkleTreesSerializer.deserialize(MerkleTrees.java:416) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:100) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:84) > at > org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:782) > at org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:642) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.deserialize(InboundMessageHandler.java:364) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.access$1100(InboundMessageHandler.java:317) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessLargeMessage.provideMessage(InboundMessageHandler.java:504) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:429) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:834)no* further _formatting_ is > done here{noformat} > > -XX:+AlwaysPreTouch > -XX:+CrashOnOutOfMemoryError > -XX:+ExitOnOutOfMemoryError > -XX:+HeapDumpOnOutOfMemoryError > -XX:+ParallelRefProcEnabled > -XX:+PerfDisableSharedMem > -XX:+ResizeTLAB > -XX:+UseG1GC > -XX:+UseNUMA > -XX:+UseTLAB > -XX:+UseThreadPriorities > -XX:-UseBiasedLocking > -XX:CompileCommandFile=/opt/nosql/clusters/cassandra-101/conf/hotspot_compiler > -XX:G1RSetUpdatingPauseTimePercent=5 > -XX:G1ReservePercent=20 > -XX:HeapDumpPath=/opt/nosql/data/cluster_101/cassandra-1691623098-pid2804737.hprof > -XX:InitiatingHeapOccupancyPercent=70 > -XX:MaxGCPauseMillis=200 > -XX:StringTableSize=60013 > -Xlog:gc*:file=/opt/nosql/clusters/cassandra-101/logs/gc.log:time,uptime:filecount=10,filesize=10485760 > -Xms16G > -Xmx16G > -Xss256k > > From our Prometheus metrics, the behavior shows the direct buffer memory > ramping up until it reaches the max and then causes an OOM. It would appear > that direct memory is never being released by the JVM until its exhausted. > > !Cluster-dm-metrics.PNG! > An Eclipse Memory Analyzer > Class Histogram: > ||Class Name||Objects||Shallow Heap||Retained Heap|| > |java.lang.Object[]|445,014|42,478,160|>= 4,603,280,344| | > |io.netty.util.concurrent.FastThreadLocalThread|167|21,376|>= 4,467,294,736| > Leaks: Problem Suspect 1 > The thread *io.netty.util.concurrent.FastThreadLocalThread @ 0x501dd5930 > AntiEntropyStage:1* keeps local variables with total size *4,295,042,472 > (84.00%)* bytes. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org