[
https://issues.apache.org/jira/browse/HDDS-9324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17772291#comment-17772291
]
Stephen O'Donnell commented on HDDS-9324:
-----------------------------------------
We discussed this in a meeting today.
We have metrics that show the number of pending container, pending pipelines
etc per node for all decommissioning hosts. It would be pretty simple to
display these using an 'ozone insight' command, or better "ozone admin datanode
decommissionStatus".
I think that is a simple change that would add some value and make the metrics
more accessible and the client could simply poll the scm/prom end point to get
the list of metrics like the insight commands do.
Displaying the list of under replication containers is a more difficult change.
The `container admin replication report` command needed new protobuf messages
etc. It may be possible to reuse those, however ... the under replicated
containers on the cluster are listed via the report command.
If decommission is stuck, then the existing report can be used to access those
containers which are stuck, as can the SCM log, as the decommission monitor
prints them on each iteration.
If many nodes are decommissioning, then the report command should show a mix of
both nodes under-replicated containers.
In the interest of getting something simple working first, I think we should
rely on the existing replication report for the list of under replicated
containers on the cluster, and then simply display the existing metrics in a
new `ozone admin datanode decommissionStatus` command. In the future if we feel
we need the decommissionStatus command to list out individual containers we can
add that.
> Add CLI to view status of datanode decommissioning
> --------------------------------------------------
>
> Key: HDDS-9324
> URL: https://issues.apache.org/jira/browse/HDDS-9324
> Project: Apache Ozone
> Issue Type: Bug
> Components: SCM
> Reporter: Ethan Rose
> Assignee: Tejaskriya Madhan
> Priority: Major
>
> Currently, monitoring decommissioning status from the CLI is done using
> {{ozone admin datanode list}} and checking the status of the nodes. If there
> are specific containers waiting to be replicated that are blocking
> decommissioning, the SCM log needs to be checked. In this Jira, I propose
> adding an {{ozone admin datanode decommission status}} command that will list
> only the datanodes that are currently decommissioning, and any replicas they
> have that are blocking the decommissioning.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]