[ 
https://issues.apache.org/jira/browse/HDDS-11240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871531#comment-17871531
 ] 

weiming commented on HDDS-11240:
--------------------------------

[~smeng]  
Due to issues with cluster throughput and scale, we are experiencing this 
problem quite frequently (approximately every 2-3 days). We have looked into 
higher versions of JDK 17 (e.g., 17.0.11/17.0.12), but we are not optimistic 
about resolving this issue by upgrading to a newer JDK 17 version because we 
have not seen any significant changes related to ThreadLocal.

Therefore, our current approach is as follows:
Firstly, we are trying to remove some locks that are causing high CPU load, at 
least to ensure that our production environment does not encounter this 
problem. We have modified a version, which is currently in grayscale release, 
and we are continuously monitoring it.
Secondly, we are attempting to use higher versions of JDK 17 (e.g., 17.0.12) to 
see if the problem still exists. As you mentioned, if the issue persists, it 
might be a problem with the JDK itself.

> High cpu usage on ReadWrite locks in JDK17
> ------------------------------------------
>
>                 Key: HDDS-11240
>                 URL: https://issues.apache.org/jira/browse/HDDS-11240
>             Project: Apache Ozone
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: JDK:
> openjdk 17.0.2 2022-01-18
> OpenJDK Runtime Environment (build 17.0.2+8-86)
> OpenJDK 64-Bit Server VM (build 17.0.2+8-86, mixed mode, sharing)
> Ozone:
> 1.4.0
>  
>            Reporter: weiming
>            Assignee: Tanvi Penumudy
>            Priority: Major
>         Attachments: flamegraph.profile.html, 
> image-2024-07-28-20-17-58-466.png, image-2024-07-30-09-32-16-320.png
>
>
> That will cause threads on the following stack trace to consume a lot of CPU:
> "IPC Server handler 7 on default port 9862" #3994 daemon prio=5 os_prio=0 
> cpu=5403833.36ms elapsed=653145.54s tid=0x00007fa03fdd2a00 nid=0x921f9 
> runnable  [0x00007fa0ca3fd000]
>    java.lang.Thread.State: RUNNABLE
>         at 
> java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry([email protected]/ThreadLocal.java:632)
>         at 
> java.lang.ThreadLocal$ThreadLocalMap.remove([email protected]/ThreadLocal.java:516)
>         at java.lang.ThreadLocal.remove([email protected]/ThreadLocal.java:242)
>         at 
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared([email protected]/ReentrantReadWriteLock.java:430)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared([email protected]/AbstractQueuedSynchronizer.java:1094)
>         at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock([email protected]/ReentrantReadWriteLock.java:897)
>         at 
> org.apache.hadoop.ozone.upgrade.AbstractLayoutVersionManager.needsFinalization(AbstractLayoutVersionManager.java:182)
>         at 
> org.apache.hadoop.ozone.om.request.validation.ValidationCondition$1.shouldApply(ValidationCondition.java:39)
>         at 
> org.apache.hadoop.ozone.om.request.validation.RequestValidations.lambda$0(RequestValidations.java:110)
>         at 
> org.apache.hadoop.ozone.om.request.validation.RequestValidations$$Lambda$839/0x00000008013cda80.test(Unknown
>  Source)
>  
> [^flamegraph.profile.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to