[ 
https://issues.apache.org/jira/browse/HDDS-11463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882142#comment-17882142
 ] 

Ethan Rose commented on HDDS-11463:
-----------------------------------

This would be a good thing to add. Datanodes already report disk health to SCM 
via [storage 
reports|https://github.com/apache/ozone/blob/6c6dc4352e475c56a5972f8fe355c0aca2a6554c/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/StorageLocationReport.java#L39]
 attached to heartbeats. We might need to expose this to users better. I think 
Recon already has a page for this, but an SCM CLI in json format would be good 
as well.

> Track and display failed DataNode storage locations in SCM.
> -----------------------------------------------------------
>
>                 Key: HDDS-11463
>                 URL: https://issues.apache.org/jira/browse/HDDS-11463
>             Project: Apache Ozone
>          Issue Type: Sub-task
>          Components: Ozone Datanode
>            Reporter: Shilun Fan
>            Assignee: Shilun Fan
>            Priority: Major
>
> We currently frequently encounter under-replicated containers, which is 
> usually due to issues such as DataNode heartbeat failures or damaged disks on 
> DataNodes leading to container loss and subsequent automatic recovery. 
> However, it is challenging to identify damaged disks in a large cluster. This 
> JIRA will allow DataNodes to actively report damaged disks to the SCM, which 
> can then display the damaged disks in a list format.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to