[
https://issues.apache.org/jira/browse/HDDS-11240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874137#comment-17874137
]
weiming commented on HDDS-11240:
--------------------------------
[~ritesh]
I didn't capture a flame graph post finalization.
We haven't paid attention to the validateRequestLatencyNs metric before. After
the CPU load increases, this metric should be enlarged. We will pay attention
to this metric recently.
The performance of high CPU load is basically the same every time. It is the
problem of expungeStaleEntry.
The conditions that trigger this problem in our production environment:
1. Use G1 + OpenJDK 17.0.2+8-86
2. The cluster throughput is high, and the om process runs for about 1 to 2 days
This problem is bound to occur.
> High cpu usage on ReadWrite locks in JDK17
> ------------------------------------------
>
> Key: HDDS-11240
> URL: https://issues.apache.org/jira/browse/HDDS-11240
> Project: Apache Ozone
> Issue Type: Bug
> Components: OM
> Affects Versions: 1.4.0
> Environment: JDK:
> openjdk 17.0.2 2022-01-18
> OpenJDK Runtime Environment (build 17.0.2+8-86)
> OpenJDK 64-Bit Server VM (build 17.0.2+8-86, mixed mode, sharing)
> Ozone:
> 1.4.0
>
> Reporter: weiming
> Assignee: Tanvi Penumudy
> Priority: Major
> Attachments: flamegraph.profile.html,
> image-2024-07-28-20-17-58-466.png, image-2024-07-30-09-32-16-320.png
>
>
> That will cause threads on the following stack trace to consume a lot of CPU:
> "IPC Server handler 7 on default port 9862" #3994 daemon prio=5 os_prio=0
> cpu=5403833.36ms elapsed=653145.54s tid=0x00007fa03fdd2a00 nid=0x921f9
> runnable [0x00007fa0ca3fd000]
> java.lang.Thread.State: RUNNABLE
> at
> java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry([email protected]/ThreadLocal.java:632)
> at
> java.lang.ThreadLocal$ThreadLocalMap.remove([email protected]/ThreadLocal.java:516)
> at java.lang.ThreadLocal.remove([email protected]/ThreadLocal.java:242)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared([email protected]/ReentrantReadWriteLock.java:430)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared([email protected]/AbstractQueuedSynchronizer.java:1094)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock([email protected]/ReentrantReadWriteLock.java:897)
> at
> org.apache.hadoop.ozone.upgrade.AbstractLayoutVersionManager.needsFinalization(AbstractLayoutVersionManager.java:182)
> at
> org.apache.hadoop.ozone.om.request.validation.ValidationCondition$1.shouldApply(ValidationCondition.java:39)
> at
> org.apache.hadoop.ozone.om.request.validation.RequestValidations.lambda$0(RequestValidations.java:110)
> at
> org.apache.hadoop.ozone.om.request.validation.RequestValidations$$Lambda$839/0x00000008013cda80.test(Unknown
> Source)
>
> [^flamegraph.profile.html]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]