[
https://issues.apache.org/jira/browse/HDFS-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089222#comment-14089222
]
Hudson commented on HDFS-6791:
------------------------------
FAILURE: Integrated in Hadoop-Hdfs-trunk #1830 (See
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1830/])
HDFS-6791. A block could remain under replicated if all of its replicas are on
decommissioned nodes. Contributed by Ming Ma. (jing9:
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1616306)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
*
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
*
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java
*
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java
> A block could remain under replicated if all of its replicas are on
> decommissioned nodes
> ----------------------------------------------------------------------------------------
>
> Key: HDFS-6791
> URL: https://issues.apache.org/jira/browse/HDFS-6791
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Ming Ma
> Assignee: Ming Ma
> Fix For: 2.6.0
>
> Attachments: HDFS-6791-2.patch, HDFS-6791-3.patch, HDFS-6791.patch
>
>
> Here is the scenario.
> 1. Normally before NN transitions a DN to decommissioned state, enough
> replicas have been copied to other "in service" DNs. However, in some rare
> situations, the cluster got into a state where a DN is in decommissioned
> state and a block's only replica is on that DN. In such state, the number of
> replication reported by fsck is 1; the block just stays in under replicated
> state; applications can still read the data, given decommissioned node can
> served read traffic.
> This can happen in some error situations such DN failure or NN failover. For
> example
> a) a block's only replica is node A temporarily.
> b) Start decommission process on node A.
> c) When node A is in "decommission-in-progress" state, node A crashed. NN
> will mark node A as dead.
> d) After node A rejoins the cluster, NN will mark node A as decommissioned.
> 2. In theory, NN should take care of under replicated blocks. But it doesn't
> for this special case where the only replica is on decommissioned node. That
> is because NN has the policy of "decommissioned node can't be picked the
> source node for replication".
> {noformat}
> BlockManager.java
> chooseSourceDatanode
> // never use already decommissioned nodes
> if(node.isDecommissioned())
> continue;
> {noformat}
> 3. Given NN marks the node as decommissioned, admins will shutdown the
> datanode. Under replicated blocks turn into missing blocks.
> 4. The workaround is to recommission the node so that NN can start the
> replication from the node.
--
This message was sent by Atlassian JIRA
(v6.2#6252)