Mark Gui created HDDS-5401:
------------------------------
Summary: Add more metrics to ReplicationManager to help monitor
replication progress
Key: HDDS-5401
URL: https://issues.apache.org/jira/browse/HDDS-5401
Project: Apache Ozone
Issue Type: Improvement
Reporter: Mark Gui
Assignee: Mark Gui
For now SCM ReplicationManager only has 2 metrics: inflightReplication and
inflightDeletion.
We could add more metrics to help better monitor the replication progress(via
prometheus e.g.).
Then we could also estimate the time needed to complete the whole replication.
Some proposed metrics:
* number of replicate/delete cmds sent
* number of replicate/delete cmds completed
* number of replicate/delete cmds timeout
These metrics will be refreshed for each replication round(300s by default). So
we could calculate how many replicate/delete are completed between 2 successive
rounds and how many are undergoing, thus we could estimate how much more time
it needs.
Two more metrics to help more accurate estimation since closed containers could
be in different sizes:
* number of replicate bytes total
* number of replicate bytes completed
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]