[ 
https://issues.apache.org/jira/browse/HDDS-11481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885694#comment-17885694
 ] 

Shilun Fan commented on HDDS-11481:
-----------------------------------

[~erose] 

Thank you very much for your feedback. I agree with your point that we should 
continue the page-related development on Recon in the future, and indeed, we 
should not pursue this development on SCM. We are also using Recon internally, 
and I have noticed several recent improvements; I will backport these 
improvements to our internal system, as Recon is indeed a very useful component.

- Rather than for completeness, what is the use case for this feature? In what 
scenario would we want the datanode process running but not registered or 
reporting to SCM?

The reason I thought of this feature is that we have encountered some 
difficulties. One of our clusters has 1,500 nodes, and after restarting the 
SCM, most of the nodes register successfully, but there are always a few DNs 
(usually 5-6) that take a long time to register. I hope to quickly identify 
these DNs that are unable to register.

Currently, I am doing it this way: since SCM operates in HA mode, I compare the 
list of DNs from the restarted SCM with the list of DNs from the non-restarted 
SCM to find those that have not registered. This process is somewhat complex.

I wonder if we can take a cue from HDFS. In HDFS, we maintain a list of DNs 
through the `slaves` configuration file. This way, after the NN restarts, we 
can easily know how many DNs are in the cluster. If SCM had this feature, we 
could display the unregistered DNs as "UNKNOWN" status in the DN list, making 
it easier to quickly identify those that have not registered.

 

 

> Enhanced SCM Support for DataNode Management
> --------------------------------------------
>
>                 Key: HDDS-11481
>                 URL: https://issues.apache.org/jira/browse/HDDS-11481
>             Project: Apache Ozone
>          Issue Type: Wish
>          Components: SCM
>            Reporter: Shilun Fan
>            Assignee: Shilun Fan
>            Priority: Major
>         Attachments: screenshot-1.png
>
>
> I plan to enhance SCM's support for DataNode management, including features 
> like blacklist and whitelist.
> Compared to the DataNode management functionality in HDFS, SCM's DataNode 
> management still has some incomplete features:
> 1. For instance, the blacklist and whitelist functionality is missing. 
> Currently, all DataNodes can register with SCM once they are started, but for 
> the sake of completeness, we should implement a blacklist feature.
> 2. The display list function for DataNodes in SCM is not user-friendly, with 
> the following issues: 
> -The list does not support global sorting. 
> - It cannot display the decommissioning progress. Once the decommissioning 
> process begins, we can only passively refresh the page or rely on metrics to 
> make judgments. 
> - Key information about DataNodes is missing from the list, such as the 
> number of containers and the number of pipelines.
> 3. In HDFS, if multiple DataNode versions are detected in the cluster, there 
> are helpful prompts, but SCM's recognition and support for multiple DataNode 
> versions are insufficient.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to