[ 
https://issues.apache.org/jira/browse/HDFS-7877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356375#comment-14356375
 ] 

Ming Ma commented on HDFS-7877:
-------------------------------

Thanks Eddy for the review and suggestions. Please find my response below. 
Chris might have more to add.

bq. Why is the node state the combination of <live|dead> and In 
service|Decommissioned|In maintenance..?
There are two state machines for datanode. One is called liveness state. 
Another one is called admin state. HDFS-7521 has some discussion around that. 
So datanode can be in any combination of these two states. That is why we have 
the case where if a node becomes dead when it is being decommissioned, it will 
remains in {{DECOMMISSION_IN_PROGRESS}} state until all the blocks are properly 
replicated.
 

bq. After NN re-starts, I think NN could not find out whether DN is in 
enter_maintenance or in_maintenance mode? 
The design handles the datanode state management for {{ENTERING_MAINTENANCE}} 
and {{IN_MAINTENANCE}} somewhat similar to {{DECOMMISSION_IN_PROGRESS}} and 
{{DECOMMISSIONED}} in the following ways.

1. When a node registers with NN ( could be datanode restart or NN restart ), 
it will first transition to DECOMMISSION_IN_PROGRESS if it is in exclude file; 
or  ENTERING_MAINTENANCE if it is in maintenance file.
2. Only after target replication has been reached, it will be transitioned to 
the final state, DECOMMISSIONED or IN_MAINTENANCE.

bq. Moreover, after NN restarts, if a DN is actually in the maintenance mode 
(DN is shutting down for maintenance), NN could not receive block reports from 
this DN.
After NN restarts, if a DN in maintenance file doesn't register with NN, then 
it won't be in {{DatanodeManager}}'s {{datanodeMap}} and thus the state won't 
be tracked. So it should be similar to how decommission is handled.

If the DN does register with NN, there is a bug in the patch that doesn't check 
if NN has received blockreport from the DN so that it doesn't prematurely 
transition the DN to {{in_maintenance}} state.

bq. Is "put the dead node into maintenance mode" necessary?
Good question, if it is ok to keep the node in {{dead, normal}} state when 
admins add the node to maintenance file.

The intention is to make it consistent with the actual content in maintenance 
file. It is similar to how decommission is handled; if you add a dead node to 
exclude file, the node will go directly into {{DECOMMISSIONED}} state. For 
replicas processing, {{dead, in_maintenance}} -> {{live, in_maintenance}} won't 
trigger excess blocks removal; {{live, in_maintenance}} -> {{live, normal}} 
will.

bq. Timeout support
Good suggestion. We discussed this topic during the design discussion. We feel 
like the admin script can handle that outside HDFS; upon timeout, the admin 
script can remove the node from maintenance file and thus trigger replication. 
If we support timeout in HDFS, nodes in maintenance files won't necessarily be 
in maintenance states. Alternatively we can add another state called 
maintenance_timeout. But that might be too complicated. I can understand the 
benefit of having a timeout here. So we would like to hear others suggestion.


There are two new topics we want to bring up.

* The original design doc uses cluster default minimal replication factor to 
decide if the node can exit {{ENTERING_MAINTENANCE}} state. We might want to 
use a new config value so that we can set the value to two. For scenario like 
hadoop software upgrade, if used together with upgrade domain "two replicas" 
will be met right away for most blocks. For scenario like rack repair, "two 
replicas" can give us better data availability. At least we can test out 
different values independent of the cluster's minimal replication factor.

* If read is allowed on node in {{ENTERING_MAINTENANCE}} state. Perhaps we 
should support that. That will handle the case where that is the only replica 
available. We can put such replica at the end of LocatedBlock.



> Support maintenance state for datanodes
> ---------------------------------------
>
>                 Key: HDFS-7877
>                 URL: https://issues.apache.org/jira/browse/HDFS-7877
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Ming Ma
>         Attachments: HDFS-7877.patch, Supportmaintenancestatefordatanodes.pdf
>
>
> This requirement came up during the design for HDFS-7541. Given this feature 
> is mostly independent of upgrade domain feature, it is better to track it 
> under a separate jira. The design and draft patch will be available soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to