[jira] [Comment Edited] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory

Brad Schoening (Jira) Wed, 06 Dec 2023 13:15:01 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17793935#comment-17793935
 ]


Brad Schoening edited comment on CASSANDRA-18762 at 12/6/23 9:13 PM:
---------------------------------------------------------------------

An update: we are still seeing this occur on a cluster. They have configured 
native_transport_max_thread = 256. A large number of repair Merkle trees 
precedes the OOM crash.

System: Apache Cassandra 4.0.10, a 16GB heap, 64GB RAM, 8 vCPUs

file_cache_size_in_mb =4096, G1HeapRegionSize=16M

!image-2023-12-06-15-58-55-007.png!

above graph is missing time ticks, but the spike occurs at 06:16:00

!image-2023-12-06-15-29-31-491.png!

 

Summary of the cassandra log:

{{{{11:17:10,289  [INFO ]  RepairSession.java:202 - [repair 
#838c24c0-935f-11ee-97ba-d79b6a12ccbe|#838c24c0-935f-11ee-97ba-d79b6a12ccbe] 
Received merkle tree for table1 from ...  [repeated 35 times]}}}}
{{{{11:17:17,155 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 694ms.  G1 Eden Space: 
8925478912 -> 0; G1 Old Gen: 2196360784 -> 1133473904; G1 Survivor Space: 
385875968 -> 0; }}}}
{{{{11:17:17,668 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 505ms.  G1 Old Gen: 1133473904 
-> 1133526408; }}}}
{{{{11:17:22,420 [INFO ] [ScheduledTasks:1] cluster_id=99 ip_address=10.0.0.1  
NoSpamLogger.java:92 - Some operations were slow, details available at debug 
level (debug.log)}}}}
{{{{11:17:22,417 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 787ms.  G1 Eden Space: 16777216 
-> 0; G1 Old Gen: 1133526408 -> 1133545448; }}}}
{{{{11:17:23,213 [WARN ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:292 - G1 Old Generation GC in 4742ms.  G1 Old Gen: 1133545448 
-> 1133581144; }}}}
{{{{11:17:23,217 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
StatusLogger.java:65 .... [abbr] }}}}{{{{StatusLogger.java:65 - Pool Name       
                Active   Pending      Completed   Blocked  All Time Blocked}}}}
{{{{StatusLogger.java:69 - ReadStage                            1         0     
  48261572         0                 0}}}}
{{{{StatusLogger.java:69 - Native-Transport-Requests            1         0     
 395189663         0                 0}}}}
{{{{StatusLogger.java:69 - ValidationExecutor                   4        73     
    110086         0                 0}}}}
{{{{StatusLogger.java:69 - AntiEntropyStage                     1         0     
    352704         0                 0}}}}
{{{{11:17:24,114 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 853ms.  G1 Eden Space: 117440512 
-> 0; G1 Old Gen: 1133747360 -> 1133758448; }}}}
{{{{11:17:24,564 [ERROR] [Messaging-EventLoop-3-5] cluster_id=99 
ip_address=10.0.0.1  JVMStabilityInspector.java:102 - OutOfMemory error letting 
the JVM handle the error:}}}}
{{{{java.lang.OutOfMemoryError: Direct buffer memory}}}}
{{{{    at java.base/java.nio.Bits.reserveMemory(Bits.java:175)}}}}
{{{{    at 
java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:118)}}}}
{{{{    at 
java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318)}}}}{{    ... 
etc}}


was (Author: bschoeni):
An update: we are still seeing this occur on a cluster. They have configured 
native_transport_max_thread = 256. A large number of repair Merkle trees 
precedes the OOM crash.

System: Apache Cassandra 4.0.10, a 16GB heap, 64GB RAM, 8 vCPUs

file_cache_size_in_mb =4096, G1HeapRegionSize=16M

!image-2023-12-06-15-58-55-007.png!

above graph is missing time ticks, but the spike occurs at 06:16:00

!image-2023-12-06-15-29-31-491.png!

 

Summary of the cassandra log:

{{11:17:10,289  [INFO ]  RepairSession.java:202 - [repair 
#838c24c0-935f-11ee-97ba-d79b6a12ccbe|#838c24c0-935f-11ee-97ba-d79b6a12ccbe] 
Received merkle tree for table1 from ...  [repeated 35 times]}}
{{11:17:17,155 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 694ms.  G1 Eden Space: 
8925478912 -> 0; G1 Old Gen: 2196360784 -> 1133473904; G1 Survivor Space: 
385875968 -> 0; }}
{{11:17:17,668 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 505ms.  G1 Old Gen: 1133473904 
-> 1133526408; }}
{{11:17:22,420 [INFO ] [ScheduledTasks:1] cluster_id=99 ip_address=10.0.0.1  
NoSpamLogger.java:92 - Some operations were slow, details available at debug 
level (debug.log)}}
{{11:17:22,417 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 787ms.  G1 Eden Space: 16777216 
-> 0; G1 Old Gen: 1133526408 -> 1133545448; }}
{{11:17:23,213 [WARN ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:292 - G1 Old Generation GC in 4742ms.  G1 Old Gen: 1133545448 
-> 1133581144; }}
{{11:17:23,217 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
StatusLogger.java:65 .... [abbr] }}{{StatusLogger.java:65 - Pool Name           
            Active   Pending      Completed   Blocked  All Time Blocked}}
{{StatusLogger.java:69 - ReadStage                            1         0       
48261572         0                 0}}
{{StatusLogger.java:69 - Native-Transport-Requests            1         0      
395189663         0                 0}}
{{StatusLogger.java:69 - ValidationExecutor                   4        73       
  110086         0                 0}}
{{StatusLogger.java:69 - AntiEntropyStage                     1         0       
  352704         0                 0}}
{{11:17:24,114 [INFO ] [Service Thread] cluster_id=99 ip_address=10.0.0.1  
GCInspector.java:294 - G1 Old Generation GC in 853ms.  G1 Eden Space: 117440512 
-> 0; G1 Old Gen: 1133747360 -> 1133758448; }}
{{11:17:24,564 [ERROR] [Messaging-EventLoop-3-5] cluster_id=99 
ip_address=10.0.0.1  JVMStabilityInspector.java:102 - OutOfMemory error letting 
the JVM handle the error:}}
{{java.lang.OutOfMemoryError: Direct buffer memory}}
{{    at java.base/java.nio.Bits.reserveMemory(Bits.java:175)}}
{{    at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:118)}}
{{    at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318)}}

    ... etc

> Repair triggers OOM with direct buffer memory
> ---------------------------------------------
>
>                 Key: CASSANDRA-18762
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18762
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Repair
>            Reporter: Brad Schoening
>            Priority: Normal
>              Labels: OutOfMemoryError
>         Attachments: Cluster-dm-metrics-1.PNG, 
> image-2023-12-06-15-28-05-459.png, image-2023-12-06-15-29-31-491.png, 
> image-2023-12-06-15-58-55-007.png
>
>
> We are seeing repeated failures of nodes with 16GB of heap and the same size 
> (16GB) for direct memory (derived from -Xms).  This seems to be related to 
> CASSANDRA-15202 which moved merkel trees off-heap in 4.0.   Using Cassandra 
> 4.0.6.
> {noformat}
> 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 
> ip_address=169.0.0.1 RepairSession.java:202 - [repair 
> #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from 
> /169.102.200.241:7000
> 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 
> ip_address=169.0.0.1 RepairSession.java:202 - [repair 
> #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from 
> /169.93.192.29:7000
> 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 
> ip_address=169.0.0.1 RepairSession.java:202 - [repair 
> #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from 
> /169.104.171.134:7000
> 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 
> ip_address=169.0.0.1 RepairSession.java:202 - [repair 
> #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from 
> /169.79.232.67:7000
> 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 
> ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. 
> Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; 
> G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; 
> Metaspace: 80411136 -> 80176528
> 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 
> ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error 
> letting the JVM handle the error:
> java.lang.OutOfMemoryError: Direct buffer memory
> at java.base/java.nio.Bits.reserveMemory(Bits.java:175)
> at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:118)
> at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318)
> at org.apache.cassandra.utils.MerkleTree.allocate(MerkleTree.java:742)
> at 
> org.apache.cassandra.utils.MerkleTree.deserializeOffHeap(MerkleTree.java:780)
> at org.apache.cassandra.utils.MerkleTree.deserializeTree(MerkleTree.java:751)
> at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:720)
> at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:698)
> at 
> org.apache.cassandra.utils.MerkleTrees$MerkleTreesSerializer.deserialize(MerkleTrees.java:416)
> at 
> org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:100)
> at 
> org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:84)
> at 
> org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:782)
> at org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:642)
> at 
> org.apache.cassandra.net.InboundMessageHandler$LargeMessage.deserialize(InboundMessageHandler.java:364)
> at 
> org.apache.cassandra.net.InboundMessageHandler$LargeMessage.access$1100(InboundMessageHandler.java:317)
> at 
> org.apache.cassandra.net.InboundMessageHandler$ProcessLargeMessage.provideMessage(InboundMessageHandler.java:504)
> at 
> org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:429)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:834)no* further _formatting_ is 
> done here{noformat}
>  
> -XX:+AlwaysPreTouch
> -XX:+CrashOnOutOfMemoryError
> -XX:+ExitOnOutOfMemoryError
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:+ParallelRefProcEnabled
> -XX:+PerfDisableSharedMem
> -XX:+ResizeTLAB
> -XX:+UseG1GC
> -XX:+UseNUMA
> -XX:+UseTLAB
> -XX:+UseThreadPriorities
> -XX:-UseBiasedLocking
> -XX:CompileCommandFile=/opt/nosql/clusters/cassandra-101/conf/hotspot_compiler
> -XX:G1RSetUpdatingPauseTimePercent=5
> -XX:G1ReservePercent=20
> -XX:HeapDumpPath=/opt/nosql/data/cluster_101/cassandra-1691623098-pid2804737.hprof
> -XX:InitiatingHeapOccupancyPercent=70
> -XX:MaxGCPauseMillis=200
> -XX:StringTableSize=60013
> -Xlog:gc*:file=/opt/nosql/clusters/cassandra-101/logs/gc.log:time,uptime:filecount=10,filesize=10485760
> -Xms16G
> -Xmx16G
> -Xss256k
>  
> From our Prometheus metrics, the behavior shows the direct buffer memory 
> ramping up until it reaches the max and then causes an OOM.  It would appear 
> that direct memory is never being released by the JVM until its exhausted.
>  
> !Cluster-dm-metrics.PNG!
> An Eclipse Memory Analyzer
> Class Histogram:
> ||Class Name||Objects||Shallow Heap||Retained Heap||
> |java.lang.Object[]|445,014|42,478,160|>= 4,603,280,344| |
> |io.netty.util.concurrent.FastThreadLocalThread|167|21,376|>= 4,467,294,736|
> Leaks: Problem Suspect 1
> The thread *io.netty.util.concurrent.FastThreadLocalThread @ 0x501dd5930 
> AntiEntropyStage:1* keeps local variables with total size *4,295,042,472 
> (84.00%)* bytes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory

Reply via email to