[jira] [Commented] (HDFS-12643) HDFS maintenance state behaviour is confusing and not well documented

Kihwal Lee (Jira) Tue, 28 Sep 2021 08:15:21 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-12643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17421453#comment-17421453
 ]


Kihwal Lee commented on HDFS-12643:
-----------------------------------

Probably the missing information is that the cluster nodes need to be actively 
managed using {{dfs.hosts}} in order to use the maintenance mode feature.  It 
was likely overlooked because most big organizations do use either the old or 
new combined hosts file to manage cluster membership. For example, 
decommissioning also requires the use of hosts file based cluster membership 
management.  At minimum, the documentation need to be updated.


> HDFS maintenance state behaviour is confusing and not well documented
> ---------------------------------------------------------------------
>
>                 Key: HDFS-12643
>                 URL: https://issues.apache.org/jira/browse/HDFS-12643
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: documentation, namenode
>            Reporter: Andre Araujo
>            Priority: Major
>
> The current implementation of the HDFS maintenance state feature is confusing 
> and error-prone. The documentation is missing important information that's 
> required for the correct use of the feature.
> For example, if the Hadoop admin wants to put a single node in maintenance 
> state, he/she can add a single entry to the maintenance file with the 
> contents:
> {code}
> {
>    "hostName": "host-1.example.com",
>    "adminState": "IN_MAINTENANCE",
>    "maintenanceExpireTimeInMS": 1507663698000
> }
> {code}
> Let's say now that the actual maintenance finished well before the set 
> expiration time and the Hadoop admin wants to bring the node back to NORMAL 
> state. It would be natural to simply change the state of the node, as show 
> below, and run another refresh:
> {code}
> {
>    "hostName": "host-1.example.com",
>    "adminState": "NORMAL"
> }
> {code}
> The configuration file above, though, not only take the node {{host-1}} out 
> of maintenance state but it also *blacklists all the other DataNodes*. This 
> behaviour seems inconsistent to me and is due to {{emptyInServiceNodeLists}} 
> being set to {{false}} 
> [here|https://github.com/apache/hadoop/blob/230b85d5865b7e08fb7aaeab45295b5b966011ef/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CombinedHostFileManager.java#L80]
>  only when there is at least one node with {{adminState = NORMAL}} listed in 
> the file.
> I believe that it would be more consistent, and less error prone, to simply 
> implement the following:
> * If the dfs.hosts file is empty, all nodes are allowed and in normal state
> * If the file is not empty, any host *not* listed in the file is 
> *blacklisted*, regardless of the state of the hosts listed in the file.
> Regardless of the implementation being changed or not, the documentation also 
> needs to be updated to ensure the readers know of the caveats mentioned above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-12643) HDFS maintenance state behaviour is confusing and not well documented

Reply via email to