Ethan Rose created HDDS-11341:
---------------------------------

             Summary: Add dashboard for HDDS health and replication progress
                 Key: HDDS-11341
                 URL: https://issues.apache.org/jira/browse/HDDS-11341
             Project: Apache Ozone
          Issue Type: Improvement
          Components: Ozone Dashboards
            Reporter: Ethan Rose


Add a Grafana dashboard to show information about datanode health, ongoing and 
pending replication and reconstruction tasks, and the amount of data being 
moved between nodes due to these tasks. This board will be useful to monitor 
during disk failure, node failure, node decom, and maintenance.

SCM replication manager likely has a lot of the metrics for ongoing tasks 
already. We may need to add more metrics to datanodes to monitor tasks that are 
ongoing (not just those that are queued) and the amount of data being moved. I 
think some datanode command queue and handler related metrics are unused as 
well and those can be checked/removed/updated as part of this PR.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to