Wei-Chiu Chuang created HDDS-15266:
--------------------------------------
Summary: Ability to trace container state transitions across a
cluster
Key: HDDS-15266
URL: https://issues.apache.org/jira/browse/HDDS-15266
Project: Apache Ozone
Issue Type: Wish
Reporter: Wei-Chiu Chuang
Request for proposal to improve observability around the container health state
cluster-wide.
Current state:
* Container Audit Logs: Accurate. Container state transitions are currently
audited in HddsDispatcher.java on individual Data Nodes, making cluster-wide
health tracking difficult. Single datanode scope allows monitoring of container
state transitions. However, container health state requires holistic view
across the cluster. For example, when a container becomes over-replicated, when
does a container becomes unhealthy; at what timestamp and the corresponding
ratis transaction id.
One possible approach is to implement a mechanism for Data Nodes to report
container state transition events to SCM (or Recon) via heartbeats, allowing
SCM to expose a holistic, cluster-wide metric for container transitions.
Open to suggestions.
References:
[https://ozone.apache.org/docs/next/system-internals/replication/data/replication-manager#container-state-descriptions]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]