[jira] [Commented] (HDFS-6791) A block could remain under replicated if all of its replicas are on decommissioned nodes

Hadoop QA (JIRA) Tue, 05 Aug 2014 15:19:41 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14086890#comment-14086890
 ]


Hadoop QA commented on HDFS-6791:
---------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12659931/HDFS-6791-2.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

                  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
                  org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives
                  org.apache.hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7563//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7563//console

This message is automatically generated.

> A block could remain under replicated if all of its replicas are on 
> decommissioned nodes
> ----------------------------------------------------------------------------------------
>
>                 Key: HDFS-6791
>                 URL: https://issues.apache.org/jira/browse/HDFS-6791
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: HDFS-6791-2.patch, HDFS-6791-3.patch, HDFS-6791.patch
>
>
> Here is the scenario.
> 1. Normally before NN transitions a DN to decommissioned state, enough 
> replicas have been copied to other "in service" DNs. However, in some rare 
> situations, the cluster got into a state where a DN is in decommissioned 
> state and a block's only replica is on that DN. In such state, the number of 
> replication reported by fsck is 1; the block just stays in under replicated 
> state; applications can still read the data, given decommissioned node can 
> served read traffic.
> This can happen in some error situations such DN failure or NN failover. For 
> example
> a) a block's only replica is node A temporarily.
> b) Start decommission process on node A.
> c) When node A is in "decommission-in-progress" state, node A crashed. NN 
> will mark node A as dead.
> d) After node A rejoins the cluster, NN will mark node A as decommissioned. 
> 2. In theory, NN should take care of under replicated blocks. But it doesn't 
> for this special case where the only replica is on decommissioned node. That 
> is because NN has the policy of "decommissioned node can't be picked the 
> source node for replication".
> {noformat}
> BlockManager.java
> chooseSourceDatanode
>       // never use already decommissioned nodes
>       if(node.isDecommissioned())
>         continue;
> {noformat}
> 3. Given NN marks the node as decommissioned, admins will shutdown the 
> datanode. Under replicated blocks turn into missing blocks.
> 4. The workaround is to recommission the node so that NN can start the 
> replication from the node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6791) A block could remain under replicated if all of its replicas are on decommissioned nodes

Reply via email to