[ 
https://issues.apache.org/jira/browse/HDDS-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-5401:
---------------------------------
    Labels: pull-request-available  (was: )

> Add more metrics to ReplicationManager to help monitor replication progress
> ---------------------------------------------------------------------------
>
>                 Key: HDDS-5401
>                 URL: https://issues.apache.org/jira/browse/HDDS-5401
>             Project: Apache Ozone
>          Issue Type: Improvement
>            Reporter: Mark Gui
>            Assignee: Mark Gui
>            Priority: Major
>              Labels: pull-request-available
>
> For now SCM ReplicationManager only has 2 metrics: inflightReplication and 
> inflightDeletion.
> We could add more metrics to help better monitor the replication progress(via 
> prometheus e.g.).
> Then we could also estimate the time needed to complete the whole replication.
> Some proposed metrics:
>  * number of replicate/delete cmds sent
>  * number of replicate/delete cmds completed
>  * number of replicate/delete cmds timeout
> These metrics will be refreshed for each replication round(300s by default). 
> So we could calculate how many replicate/delete are completed between 2 
> successive rounds and how many are undergoing, thus we could estimate how 
> much more time it needs.
> Two more metrics to help more accurate estimation since closed containers 
> could be in different sizes:
>  * number of replicate bytes total
>  * number of replicate bytes completed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to