[ 
https://issues.apache.org/jira/browse/HDDS-11100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863973#comment-17863973
 ] 

Shilun Fan edited comment on HDDS-11100 at 7/9/24 1:21 AM:
-----------------------------------------------------------

In our Ozone cluster, we have been continuously troubled by DN memory issues. 
Below is our DN memory configuration:
{code:java}
-Xms64g -Xmx64g -Xmn16g -XX:MaxDirectMemorySize=32g
{code}
We may wonder why we configured such large memory settings. This is because our 
Ozone cluster is very large, with the largest cluster currently having more 
than 1K nodes, and the DN stores a large amount of data. Historically, we have 
encountered OOM (Out of Memory) issues, so we increased the memory allocation.

However, during usage, we encounter situations where users experience data 
access timeouts. For example, downloading a few megabytes of data can take 
20-30 seconds or even longer. While analyzing this issue, we focused on the GC 
(Garbage Collection) logs.

In the GC logs, we can observe a significant pattern where a large number of 
system.gc() calls occur within a short period. Initially, we thought that these 
calls were triggered by certain conditions within a third-party package. It 
wasn't until we captured the relevant stack traces for system.gc() .
{code:java}
groupManagement;id=6d;is_daemon=false;priority=5;TCCL=sun.misc.Launcher$AppClassLoader@214c265e
    @java.lang.System.gc()
        at java.nio.Bits.reserveMemory(Bits.java:666)
        at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
        at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
        at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.<init>(SegmentedRaftLogWorker.java:208)
        at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.<init>(SegmentedRaftLog.java:217)
        at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.<init>(SegmentedRaftLog.java:86)
        at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog$Builder.build(SegmentedRaftLog.java:608)
        at 
org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:189)
        at 
org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:166)
        at 
org.apache.ratis.server.impl.ServerState.lambda$new$6(ServerState.java:130)
        at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
        at 
org.apache.ratis.server.impl.ServerState.initialize(ServerState.java:147)
        at 
org.apache.ratis.server.impl.RaftServerImpl.start(RaftServerImpl.java:392)
        at 
org.apache.ratis.server.impl.RaftServerProxy.lambda$groupAddAsync$13(RaftServerProxy.java:509)
        at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
        at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
        at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745) {code}
The stack trace java.nio.Bits.reserveMemory(Bits.java:666) indicates that we 
are experiencing an out-of-heap memory issue while attempting to allocate 
off-heap memory, which triggers system.gc() calls to try and free up memory. 
This results in a cycle of repeated system.gc() calls, explaining why a large 
number of these calls are being triggered in a short period. After identifying 
this issue, we dumped the DN heap and indeed found a large number of 
DirectByteBuffer instances.

Our team members also analyzed the off-heap memory using GDB and found a large 
number of Netty objects. At this point, we wondered if there were any metrics 
that could quickly show the usage of off-heap memory. I researched JMX and 
found that HDDS-9070 can display Netty's off-heap memory usage. We configured 
this metric in Grafana.

!image-2024-07-09-09-21-10-139.png!

 

 


was (Author: slfan1989):
In our Ozone cluster, we have been continuously troubled by DN memory issues. 
Below is our DN memory configuration:
{code:java}
-Xms64g -Xmx64g -Xmn16g -XX:MaxDirectMemorySize=32g
{code}
We may wonder why we configured such large memory settings. This is because our 
Ozone cluster is very large, with the largest cluster currently having more 
than 1K nodes, and the DN stores a large amount of data. Historically, we have 
encountered OOM (Out of Memory) issues, so we increased the memory allocation.

However, during usage, we encounter situations where users experience data 
access timeouts. For example, downloading a few megabytes of data can take 
20-30 seconds or even longer. While analyzing this issue, we focused on the GC 
(Garbage Collection) logs.

In the GC logs, we can observe a significant pattern where a large number of 
system.gc() calls occur within a short period. Initially, we thought that these 
calls were triggered by certain conditions within a third-party package. It 
wasn't until we captured the relevant stack traces for system.gc() .
{code:java}
groupManagement;id=6d;is_daemon=false;priority=5;TCCL=sun.misc.Launcher$AppClassLoader@214c265e
    @java.lang.System.gc()
        at java.nio.Bits.reserveMemory(Bits.java:666)
        at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
        at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
        at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.<init>(SegmentedRaftLogWorker.java:208)
        at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.<init>(SegmentedRaftLog.java:217)
        at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.<init>(SegmentedRaftLog.java:86)
        at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog$Builder.build(SegmentedRaftLog.java:608)
        at 
org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:189)
        at 
org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:166)
        at 
org.apache.ratis.server.impl.ServerState.lambda$new$6(ServerState.java:130)
        at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
        at 
org.apache.ratis.server.impl.ServerState.initialize(ServerState.java:147)
        at 
org.apache.ratis.server.impl.RaftServerImpl.start(RaftServerImpl.java:392)
        at 
org.apache.ratis.server.impl.RaftServerProxy.lambda$groupAddAsync$13(RaftServerProxy.java:509)
        at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
        at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
        at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745) {code}
The stack trace java.nio.Bits.reserveMemory(Bits.java:666) indicates that we 
are experiencing an out-of-heap memory issue while attempting to allocate 
off-heap memory, which triggers system.gc() calls to try and free up memory. 
This results in a cycle of repeated system.gc() calls, explaining why a large 
number of these calls are being triggered in a short period. After identifying 
this issue, we dumped the DN heap and indeed found a large number of 
DirectByteBuffer instances.

Our team members also analyzed the off-heap memory using GDB and found a large 
number of Netty objects. At this point, we wondered if there were any metrics 
that could quickly show the usage of off-heap memory. I researched JMX and 
found that HDDS-9070 can display Netty's off-heap memory usage. We configured 
this metric in Grafana.

 

 

> OM/SCM Metrics support displaying Netty off-heap memory.
> --------------------------------------------------------
>
>                 Key: HDDS-11100
>                 URL: https://issues.apache.org/jira/browse/HDDS-11100
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: OM, SCM
>    Affects Versions: 1.4.1
>            Reporter: Shilun Fan
>            Assignee: Shilun Fan
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2024-07-09-09-21-10-139.png
>
>
> During a recent period, our cluster's DataNode (DN) exhibited high off-heap 
> memory utilization, triggering frequent system.gc() calls. Upon reading 
> relevant GC logs, we observed normal heap memory usage and monitored heap 
> metrics. It became apparent that off-heap memory usage might be causing the 
> system.gc() calls. We identified that DataNode can display off-heap memory 
> usage, which indirectly helped us infer potential off-heap memory overflow. 
> In HDDS-9070, we now support displaying Netty off-heap memory usage on 
> DataNode, facilitating off-heap memory monitoring. Similarly, we can display 
> Netty off-heap memory usage on OM and SCM.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to