[
https://issues.apache.org/jira/browse/HDFS-6178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976803#comment-13976803
]
Hudson commented on HDFS-6178:
------------------------------
SUCCESS: Integrated in Hadoop-Hdfs-trunk #1740 (See
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1740/])
HDFS-6178. Decommission on standby NN couldn't finish. Contributed by Ming Ma.
(jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1589002)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
*
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
*
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommission.java
> Decommission on standby NN couldn't finish
> ------------------------------------------
>
> Key: HDFS-6178
> URL: https://issues.apache.org/jira/browse/HDFS-6178
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Reporter: Ming Ma
> Assignee: Ming Ma
> Fix For: 2.5.0
>
> Attachments: HDFS-6178-2.patch, HDFS-6178.patch
>
>
> Currently decommissioning machines in HA-enabled cluster requires running
> refreshNodes in both active and standby nodes. Sometimes decommissioning
> won't finish from standby NN's point of view. Here is the diagnosis of why
> it could happen.
> Standby NN's blockManager manages blocks replication and block invalidation
> as if it is the active NN; even though DNs will ignore block commands coming
> from standby NN. When standby NN makes block operation decisions such as the
> target of block replication and the node to remove excess blocks from, the
> decision is independent of active NN. So active NN and standby NN could have
> different states. When we try to decommission nodes on standby nodes; such
> state inconsistency might prevent standby NN from making progress. Here is an
> example.
> Machine A
> Machine B
> Machine C
> Machine D
> Machine E
> Machine F
> Machine G
> Machine H
> 1. For a given block, both active and standby have 5 replicas on machine A,
> B, C, D, E. So both active and standby decide to pick excess nodes to
> invalidate.
> Active picked D and E as excess DNs. After the next block reports from D and
> E, active NN has 3 active replicas (A, B, C), 0 excess replica.
> {noformat}
> 2014-03-27 01:50:14,410 INFO BlockStateChange: BLOCK* chooseExcessReplicates:
> (E:50010, blk_-5207804474559026159_121186764) is added to invalidated blocks
> set
> 2014-03-27 01:50:15,539 INFO BlockStateChange: BLOCK* chooseExcessReplicates:
> (D:50010, blk_-5207804474559026159_121186764) is added to invalidated blocks
> set
> {noformat}
> Standby pick C, E as excess DNs. Given DNs ignore commands from standby,
> After the next block reports from C, D, E, standby has 2 active replicas (A,
> B), 1 excess replica (C).
> {noformat}
> 2014-03-27 01:51:49,543 INFO BlockStateChange: BLOCK* chooseExcessReplicates:
> (E:50010, blk_-5207804474559026159_121186764) is added to invalidated blocks
> set
> 2014-03-27 01:51:49,894 INFO BlockStateChange: BLOCK* chooseExcessReplicates:
> (C:50010, blk_-5207804474559026159_121186764) is added to invalidated blocks
> set
> {noformat}
> 2. Machine A decomm request was sent to standby. Standby only had one live
> replica and picked machine G, H as targets, but given standby commands was
> ignored by DNs, G, H remained in pending replication queue until they are
> timed out. At this point, you have one decommissioning replica (A), 1 active
> replica (B), one excess replica (C).
> {noformat}
> 2014-03-27 04:42:52,258 INFO BlockStateChange: BLOCK* ask A:50010 to
> replicate blk_-5207804474559026159_121186764 to datanode(s) G:50010 H:50010
> {noformat}
> 3. Machine A decomm request was sent to active NN. Active NN picked machine F
> as the target. It finished properly. So active NN had 3 active replicas (B,
> C, F), one decommissioned replica (A).
> {noformat}
> 2014-03-27 04:44:15,239 INFO BlockStateChange: BLOCK* ask 10.42.246.110:50010
> to replicate blk_-5207804474559026159_121186764 to datanode(s) F:50010
> 2014-03-27 04:44:16,083 INFO BlockStateChange: BLOCK* addStoredBlock:
> blockMap updated: F:50010 is added to blk_-5207804474559026159_121186764 size
> 7100065
> {noformat}
> 4. Standby NN picked up F as a new replica. Thus standby had one
> decommissioning replica (A), 2 active replicas (B, F), one excess replica
> (C). Standby NN kept trying to schedule replication work, but DNs ignored the
> commands.
> {noformat}
> 2014-03-27 04:44:16,084 INFO BlockStateChange: BLOCK* addStoredBlock:
> blockMap updated: F:50010 is added to blk_-5207804474559026159_121186764 size
> 7100065
> 2014-03-28 23:06:11,970 INFO
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Block:
> blk_-5207804474559026159_121186764, Expected Replicas: 3, live replicas: 2,
> corrupt replicas: 0, decommissioned replicas: 1, excess replicas: 1, Is Open
> File: false, Datanodes having this block: C:50010 B:50010 A:50010 F:50010 ,
> Current Datanode: A:50010, Is current datanode decommissioning: true
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)