[ 
https://issues.apache.org/jira/browse/HDDS-11331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874205#comment-17874205
 ] 

JiangHua Zhu commented on HDDS-11331:
-------------------------------------

It shows that there are 1303 threads in BLOCKED state, all of which are for 
StateContext#pipelineActions.
{code:java}
'1a4f60dc-0142-40cf-8692-e466c4284f0c-EndpointStateMachineTaskThread-bigdate01/10.10.10.10:9861-0
 ' thread is stuck on line #824 of 
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis
 file in calculatePipelineBytesWritten() method. Before getting stuck, this 
thread obtained 1 lock (java.util.HashMap lock) and never released it. Due to 
that 1303 threads are BLOCKED as shown in this stack trace. If threads are 
BLOCKED for a prolonged period, your application can become unresponsive.
{code}


> Datanode cannot report for a long time
> --------------------------------------
>
>                 Key: HDDS-11331
>                 URL: https://issues.apache.org/jira/browse/HDDS-11331
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: DN
>    Affects Versions: 1.4.0
>            Reporter: JiangHua Zhu
>            Assignee: JiangHua Zhu
>            Priority: Major
>         Attachments: 1505js.1
>
>
> This is an example of an online cluster.
> SCM shows that some Datanodes cannot report for a long time, and their status 
> is DEAD or STALE.
> I printed jstack information, which shows that StateContext#pipelineActions 
> is stuck and cannot report to SCM/Recon.
> The jstack information has been uploaded as an attachment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to