On Wed, Apr 9, 2014 at 8:55 AM, Ming Ma (JIRA) <[email protected]> wrote:
> > [ > https://issues.apache.org/jira/browse/HDFS-6178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] > > Ming Ma updated HDFS-6178: > -------------------------- > > Attachment: HDFS-6178-2.patch > > Thanks, Jing. Updated patch per suggestion. > > > Decommission on standby NN couldn't finish > > ------------------------------------------ > > > > Key: HDFS-6178 > > URL: https://issues.apache.org/jira/browse/HDFS-6178 > > Project: Hadoop HDFS > > Issue Type: Bug > > Components: namenode > > Reporter: Ming Ma > > Attachments: HDFS-6178-2.patch, HDFS-6178.patch > > > > > > Currently decommissioning machines in HA-enabled cluster requires > running refreshNodes in both active and standby nodes. Sometimes > decommissioning won't finish from standby NN's point of view. Here is the > diagnosis of why it could happen. > > Standby NN's blockManager manages blocks replication and block > invalidation as if it is the active NN; even though DNs will ignore block > commands coming from standby NN. When standby NN makes block operation > decisions such as the target of block replication and the node to remove > excess blocks from, the decision is independent of active NN. So active NN > and standby NN could have different states. When we try to decommission > nodes on standby nodes; such state inconsistency might prevent standby NN > from making progress. Here is an example. > > Machine A > > Machine B > > Machine C > > Machine D > > Machine E > > Machine F > > Machine G > > Machine H > > 1. For a given block, both active and standby have 5 replicas on machine > A, B, C, D, E. So both active and standby decide to pick excess nodes to > invalidate. > > Active picked D and E as excess DNs. After the next block reports from D > and E, active NN has 3 active replicas (A, B, C), 0 excess replica. > > {noformat} > > 2014-03-27 01:50:14,410 INFO BlockStateChange: BLOCK* > chooseExcessReplicates: (E:50010, blk_-5207804474559026159_121186764) is > added to invalidated blocks set > > 2014-03-27 01:50:15,539 INFO BlockStateChange: BLOCK* > chooseExcessReplicates: (D:50010, blk_-5207804474559026159_121186764) is > added to invalidated blocks set > > {noformat} > > Standby pick C, E as excess DNs. Given DNs ignore commands from standby, > After the next block reports from C, D, E, standby has 2 active replicas > (A, B), 1 excess replica (C). > > {noformat} > > 2014-03-27 01:51:49,543 INFO BlockStateChange: BLOCK* > chooseExcessReplicates: (E:50010, blk_-5207804474559026159_121186764) is > added to invalidated blocks set > > 2014-03-27 01:51:49,894 INFO BlockStateChange: BLOCK* > chooseExcessReplicates: (C:50010, blk_-5207804474559026159_121186764) is > added to invalidated blocks set > > {noformat} > > 2. Machine A decomm request was sent to standby. Standby only had one > live replica and picked machine G, H as targets, but given standby commands > was ignored by DNs, G, H remained in pending replication queue until they > are timed out. At this point, you have one decommissioning replica (A), 1 > active replica (B), one excess replica (C). > > {noformat} > > 2014-03-27 04:42:52,258 INFO BlockStateChange: BLOCK* ask A:50010 to > replicate blk_-5207804474559026159_121186764 to datanode(s) G:50010 H:50010 > > {noformat} > > 3. Machine A decomm request was sent to active NN. Active NN picked > machine F as the target. It finished properly. So active NN had 3 active > replicas (B, C, F), one decommissioned replica (A). > > {noformat} > > 2014-03-27 04:44:15,239 INFO BlockStateChange: BLOCK* ask > 10.42.246.110:50010 to replicate blk_-5207804474559026159_121186764 to > datanode(s) F:50010 > > 2014-03-27 04:44:16,083 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: F:50010 is added to blk_-5207804474559026159_121186764 > size 7100065 > > {noformat} > > 4. Standby NN picked up F as a new replica. Thus standby had one > decommissioning replica (A), 2 active replicas (B, F), one excess replica > (C). Standby NN kept trying to schedule replication work, but DNs ignored > the commands. > > {noformat} > > 2014-03-27 04:44:16,084 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: F:50010 is added to blk_-5207804474559026159_121186764 > size 7100065 > > 2014-03-28 23:06:11,970 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Block: > blk_-5207804474559026159_121186764, Expected Replicas: 3, live replicas: 2, > corrupt replicas: 0, decommissioned replicas: 1, excess replicas: 1, Is > Open File: false, Datanodes having this block: C:50010 B:50010 A:50010 > F:50010 , Current Datanode: A:50010, Is current datanode decommissioning: > true > > {noformat} > > > > -- > This message was sent by Atlassian JIRA > (v6.2#6252) >
