[
https://issues.apache.org/jira/browse/HDDS-15266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18080771#comment-18080771
]
Andrey Yarovoy edited comment on HDDS-15266 at 5/13/26 8:51 PM:
----------------------------------------------------------------
why not report unhealthy containers via metric from DN?
was (Author: JIRAUSER311321):
why not report unhealthy containers via metrics from DN?
> Ability to trace container state transitions across a cluster
> -------------------------------------------------------------
>
> Key: HDDS-15266
> URL: https://issues.apache.org/jira/browse/HDDS-15266
> Project: Apache Ozone
> Issue Type: Wish
> Reporter: Wei-Chiu Chuang
> Priority: Major
>
> Request for proposal to improve observability around the container health
> state cluster-wide.
>
> Current state:
> * Container Audit Logs: Accurate. Container state transitions are
> currently audited in HddsDispatcher.java on individual Data Nodes, making
> cluster-wide health tracking difficult. Single datanode scope allows
> monitoring of container state transitions. However, container health state
> requires holistic view across the cluster. For example, when a container
> becomes over-replicated, when does a container becomes unhealthy; at what
> timestamp and the corresponding ratis transaction id.
>
> One possible approach is to implement a mechanism for Data Nodes to report
> container state transition events to SCM (or Recon) via heartbeats, allowing
> SCM to expose a holistic, cluster-wide metric for container transitions.
>
> Open to suggestions.
>
> References:
> [https://ozone.apache.org/docs/next/system-internals/replication/data/replication-manager#container-state-descriptions]
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]