[ 
https://issues.apache.org/jira/browse/HDDS-9324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17772291#comment-17772291
 ] 

Stephen O'Donnell commented on HDDS-9324:
-----------------------------------------

We discussed this in a meeting today.

We have metrics that show the number of pending container, pending pipelines 
etc per node for all decommissioning hosts. It would be pretty simple to 
display these using an 'ozone insight' command, or better "ozone admin datanode 
decommissionStatus".

I think that is a simple change that would add some value and make the metrics 
more accessible and the client could simply poll the scm/prom end point to get 
the list of metrics like the insight commands do.

Displaying the list of under replication containers is a more difficult change. 
The `container admin replication report` command needed new protobuf messages 
etc. It may be possible to reuse those, however ... the under replicated 
containers on the cluster are listed via the report command. 

If decommission is stuck, then the existing report can be used to access those 
containers which are stuck, as can the SCM log, as the decommission monitor 
prints them on each iteration.

If many nodes are decommissioning, then the report command should show a mix of 
both nodes under-replicated containers.

In the interest of getting something simple working first, I think we should 
rely on the existing replication report for the list of under replicated 
containers on the cluster, and then simply display the existing metrics in a 
new `ozone admin datanode decommissionStatus` command. In the future if we feel 
we need the decommissionStatus command to list out individual containers we can 
add that.

> Add CLI to view status of datanode decommissioning
> --------------------------------------------------
>
>                 Key: HDDS-9324
>                 URL: https://issues.apache.org/jira/browse/HDDS-9324
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: SCM
>            Reporter: Ethan Rose
>            Assignee: Tejaskriya Madhan
>            Priority: Major
>
> Currently, monitoring decommissioning status from the CLI is done using 
> {{ozone admin datanode list}} and checking the status of the nodes. If there 
> are specific containers waiting to be replicated that are blocking 
> decommissioning, the SCM log needs to be checked. In this Jira, I propose 
> adding an {{ozone admin datanode decommission status}} command that will list 
> only the datanodes that are currently decommissioning, and any replicas they 
> have that are blocking the decommissioning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to