[jira] [Commented] (HDDS-11331) Fix Datanode unable to report for a long time

Shilun Fan (Jira) Mon, 19 Aug 2024 18:27:04 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-11331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875021#comment-17875021
 ]


Shilun Fan commented on HDDS-11331:
-----------------------------------

[~szetszwo] [~jianghuazhu] Sorry, I have some thoughts on the newly added 
heartbeat lifeline. If an issue arises within the DN, such as the pipeline 
problem mentioned earlier, we can detect this issue through a heartbeat timeout 
and locate the problematic DN on the SCM page. However, if we add a lifeline, 
the DN might appear to be normal but actually be unusable. How can we promptly 
identify such a DN? 

> Fix Datanode unable to report for a long time
> ---------------------------------------------
>
>                 Key: HDDS-11331
>                 URL: https://issues.apache.org/jira/browse/HDDS-11331
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: DN
>    Affects Versions: 1.4.0
>            Reporter: JiangHua Zhu
>            Assignee: JiangHua Zhu
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: 1505js.1, 7090_review.patch, screenshot-1.png, 
> screenshot-2.png, screenshot-3.png, screenshot-4.png, screenshot-5.png
>
>
> SCM shows that some Datanodes cannot report for a long time, and their status 
> is DEAD or STALE.
> I printed jstack information, which shows that StateContext#pipelineActions 
> is stuck and cannot report to SCM/Recon.
> The jstack information has been uploaded as an attachment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDDS-11331) Fix Datanode unable to report for a long time

Reply via email to