[
https://issues.apache.org/jira/browse/HDDS-11331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874205#comment-17874205
]
JiangHua Zhu commented on HDDS-11331:
-------------------------------------
It shows that there are 1303 threads in BLOCKED state, all of which are for
StateContext#pipelineActions.
{code:java}
'1a4f60dc-0142-40cf-8692-e466c4284f0c-EndpointStateMachineTaskThread-bigdate01/10.10.10.10:9861-0
' thread is stuck on line #824 of
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis
file in calculatePipelineBytesWritten() method. Before getting stuck, this
thread obtained 1 lock (java.util.HashMap lock) and never released it. Due to
that 1303 threads are BLOCKED as shown in this stack trace. If threads are
BLOCKED for a prolonged period, your application can become unresponsive.
{code}
> Datanode cannot report for a long time
> --------------------------------------
>
> Key: HDDS-11331
> URL: https://issues.apache.org/jira/browse/HDDS-11331
> Project: Apache Ozone
> Issue Type: Improvement
> Components: DN
> Affects Versions: 1.4.0
> Reporter: JiangHua Zhu
> Assignee: JiangHua Zhu
> Priority: Major
> Attachments: 1505js.1
>
>
> This is an example of an online cluster.
> SCM shows that some Datanodes cannot report for a long time, and their status
> is DEAD or STALE.
> I printed jstack information, which shows that StateContext#pipelineActions
> is stuck and cannot report to SCM/Recon.
> The jstack information has been uploaded as an attachment.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]