[ 
https://issues.apache.org/jira/browse/HDDS-11240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869484#comment-17869484
 ] 

weiming commented on HDDS-11240:
--------------------------------

[~erose] 

I'm not entirely sure if the cluster is in a finalized state, as we've had 
frequent upgrades (internal versions) recently.

When we were using JDK 8 earlier, we didn't notice this issue. However, this 
doesn't mean the problem didn't exist. Our cluster was not very stable back 
then, and we frequently restarted or switched the OM leader, so the issue went 
unnoticed.

We initially suspected that the JDK version might be the cause because we 
encountered a similar issue in OzoneManagerLock. Due to ThreadLocal, the CPU 
load would also become very high. Therefore, we modified the code to disable 
some metrics-related information. After making these changes, the situation 
improved, and the frequency of the issue decreased. However, we still 
experienced high CPU load, but this time it occurred within 
ReentrantReadWriteLock.

> High cpu usage on ReadWrite locks in JDK17
> ------------------------------------------
>
>                 Key: HDDS-11240
>                 URL: https://issues.apache.org/jira/browse/HDDS-11240
>             Project: Apache Ozone
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: JDK:
> openjdk 17.0.2 2022-01-18
> OpenJDK Runtime Environment (build 17.0.2+8-86)
> OpenJDK 64-Bit Server VM (build 17.0.2+8-86, mixed mode, sharing)
> Ozone:
> 1.4.0
>  
>            Reporter: weiming
>            Assignee: Tanvi Penumudy
>            Priority: Major
>         Attachments: flamegraph.profile.html, 
> image-2024-07-28-20-17-58-466.png, image-2024-07-30-09-32-16-320.png
>
>
> That will cause threads on the following stack trace to consume a lot of CPU:
> "IPC Server handler 7 on default port 9862" #3994 daemon prio=5 os_prio=0 
> cpu=5403833.36ms elapsed=653145.54s tid=0x00007fa03fdd2a00 nid=0x921f9 
> runnable  [0x00007fa0ca3fd000]
>    java.lang.Thread.State: RUNNABLE
>         at 
> java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry([email protected]/ThreadLocal.java:632)
>         at 
> java.lang.ThreadLocal$ThreadLocalMap.remove([email protected]/ThreadLocal.java:516)
>         at java.lang.ThreadLocal.remove([email protected]/ThreadLocal.java:242)
>         at 
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared([email protected]/ReentrantReadWriteLock.java:430)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared([email protected]/AbstractQueuedSynchronizer.java:1094)
>         at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock([email protected]/ReentrantReadWriteLock.java:897)
>         at 
> org.apache.hadoop.ozone.upgrade.AbstractLayoutVersionManager.needsFinalization(AbstractLayoutVersionManager.java:182)
>         at 
> org.apache.hadoop.ozone.om.request.validation.ValidationCondition$1.shouldApply(ValidationCondition.java:39)
>         at 
> org.apache.hadoop.ozone.om.request.validation.RequestValidations.lambda$0(RequestValidations.java:110)
>         at 
> org.apache.hadoop.ozone.om.request.validation.RequestValidations$$Lambda$839/0x00000008013cda80.test(Unknown
>  Source)
>  
> [^flamegraph.profile.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to