[
https://issues.apache.org/jira/browse/HDDS-11240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869484#comment-17869484
]
weiming commented on HDDS-11240:
--------------------------------
[~erose]
I'm not entirely sure if the cluster is in a finalized state, as we've had
frequent upgrades (internal versions) recently.
When we were using JDK 8 earlier, we didn't notice this issue. However, this
doesn't mean the problem didn't exist. Our cluster was not very stable back
then, and we frequently restarted or switched the OM leader, so the issue went
unnoticed.
We initially suspected that the JDK version might be the cause because we
encountered a similar issue in OzoneManagerLock. Due to ThreadLocal, the CPU
load would also become very high. Therefore, we modified the code to disable
some metrics-related information. After making these changes, the situation
improved, and the frequency of the issue decreased. However, we still
experienced high CPU load, but this time it occurred within
ReentrantReadWriteLock.
> High cpu usage on ReadWrite locks in JDK17
> ------------------------------------------
>
> Key: HDDS-11240
> URL: https://issues.apache.org/jira/browse/HDDS-11240
> Project: Apache Ozone
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: JDK:
> openjdk 17.0.2 2022-01-18
> OpenJDK Runtime Environment (build 17.0.2+8-86)
> OpenJDK 64-Bit Server VM (build 17.0.2+8-86, mixed mode, sharing)
> Ozone:
> 1.4.0
>
> Reporter: weiming
> Assignee: Tanvi Penumudy
> Priority: Major
> Attachments: flamegraph.profile.html,
> image-2024-07-28-20-17-58-466.png, image-2024-07-30-09-32-16-320.png
>
>
> That will cause threads on the following stack trace to consume a lot of CPU:
> "IPC Server handler 7 on default port 9862" #3994 daemon prio=5 os_prio=0
> cpu=5403833.36ms elapsed=653145.54s tid=0x00007fa03fdd2a00 nid=0x921f9
> runnable [0x00007fa0ca3fd000]
> java.lang.Thread.State: RUNNABLE
> at
> java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry([email protected]/ThreadLocal.java:632)
> at
> java.lang.ThreadLocal$ThreadLocalMap.remove([email protected]/ThreadLocal.java:516)
> at java.lang.ThreadLocal.remove([email protected]/ThreadLocal.java:242)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared([email protected]/ReentrantReadWriteLock.java:430)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared([email protected]/AbstractQueuedSynchronizer.java:1094)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock([email protected]/ReentrantReadWriteLock.java:897)
> at
> org.apache.hadoop.ozone.upgrade.AbstractLayoutVersionManager.needsFinalization(AbstractLayoutVersionManager.java:182)
> at
> org.apache.hadoop.ozone.om.request.validation.ValidationCondition$1.shouldApply(ValidationCondition.java:39)
> at
> org.apache.hadoop.ozone.om.request.validation.RequestValidations.lambda$0(RequestValidations.java:110)
> at
> org.apache.hadoop.ozone.om.request.validation.RequestValidations$$Lambda$839/0x00000008013cda80.test(Unknown
> Source)
>
> [^flamegraph.profile.html]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]