[ 
https://issues.apache.org/jira/browse/HDFS-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14086591#comment-14086591
 ] 

Jing Zhao commented on HDFS-6791:
---------------------------------

Thanks for working on this, [~mingma]! I also think the approach 1 should be 
useful and safe: the DataNode should continue its decommission process after 
coming back, and the admin can still use refreshNodes to stop its decommission 
afterwards.

The current patch looks good to me. Some nits:
# Looks like the variable addr has been used in 
testDecommissionStatusAfterDNRestart
# In the following code, if the replication monitor thread in the block manager 
gets delayed and only starts its first scan after calling refreshNodes, the 
decommission may finish before stopping the DN. Maybe we can also disable the 
heartbeats of DNs to make sure the replication never succeeds? But this is a 
very rare case with very low possibility, thus this change can be optional.
{code}
+    decommissionNode(fsn, localFileSys, dnName);
+    dm.refreshNodes(conf);
+
+    // Stop the DN when decommission is in progress.
+    DataNodeProperties dataNodeProperties = cluster.stopDataNode(dnName);
{code}

> A block could remain under replicated if all of its replicas are on 
> decommissioned nodes
> ----------------------------------------------------------------------------------------
>
>                 Key: HDFS-6791
>                 URL: https://issues.apache.org/jira/browse/HDFS-6791
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: HDFS-6791.patch
>
>
> Here is the scenario.
> 1. Normally before NN transitions a DN to decommissioned state, enough 
> replicas have been copied to other "in service" DNs. However, in some rare 
> situations, the cluster got into a state where a DN is in decommissioned 
> state and a block's only replica is on that DN. In such state, the number of 
> replication reported by fsck is 1; the block just stays in under replicated 
> state; applications can still read the data, given decommissioned node can 
> served read traffic.
> This can happen in some error situations such DN failure or NN failover. For 
> example
> a) a block's only replica is node A temporarily.
> b) Start decommission process on node A.
> c) When node A is in "decommission-in-progress" state, node A crashed. NN 
> will mark node A as dead.
> d) After node A rejoins the cluster, NN will mark node A as decommissioned. 
> 2. In theory, NN should take care of under replicated blocks. But it doesn't 
> for this special case where the only replica is on decommissioned node. That 
> is because NN has the policy of "decommissioned node can't be picked the 
> source node for replication".
> {noformat}
> BlockManager.java
> chooseSourceDatanode
>       // never use already decommissioned nodes
>       if(node.isDecommissioned())
>         continue;
> {noformat}
> 3. Given NN marks the node as decommissioned, admins will shutdown the 
> datanode. Under replicated blocks turn into missing blocks.
> 4. The workaround is to recommission the node so that NN can start the 
> replication from the node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to