[ 
https://issues.apache.org/jira/browse/HDFS-6178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13957959#comment-13957959
 ] 

Jing Zhao commented on HDFS-6178:
---------------------------------

Actually we do not need to push the refreshNodes in SBN to after failover? We 
can run refreshNodes in ANN and SBN around the same time and just disable the 
replication monitor for SBN. In that case the only side effect is that SBN can 
have a different internal view about replication.



> Decommission on standby NN couldn't finish
> ------------------------------------------
>
>                 Key: HDFS-6178
>                 URL: https://issues.apache.org/jira/browse/HDFS-6178
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Ming Ma
>
> Currently decommissioning machines in HA-enabled cluster requires running 
> refreshNodes in both active and standby nodes. Sometimes decommissioning 
> won't finish from standby NN's point of view.  Here is the diagnosis of why 
> it could happen.
> Standby NN's blockManager manages blocks replication and block invalidation 
> as if it is the active NN; even though DNs will ignore block commands coming 
> from standby NN. When standby NN makes block operation decisions such as the 
> target of block replication and the node to remove excess blocks from, the 
> decision is independent of active NN. So active NN and standby NN could have 
> different states. When we try to decommission nodes on standby nodes; such 
> state inconsistency might prevent standby NN from making progress. Here is an 
> example.
> Machine A
> Machine B
> Machine C
> Machine D
> Machine E
> Machine F
> Machine G
> Machine H
> 1. For a given block, both active and standby have 5 replicas on machine A, 
> B, C, D, E. So both active and standby decide to pick excess nodes to 
> invalidate.
> Active picked D and E as excess DNs. After the next block reports from D and 
> E, active NN has 3 active replicas (A, B, C), 0 excess replica.
> {noformat}
> 2014-03-27 01:50:14,410 INFO BlockStateChange: BLOCK* chooseExcessReplicates: 
> (E:50010, blk_-5207804474559026159_121186764) is added to invalidated blocks 
> set
> 2014-03-27 01:50:15,539 INFO BlockStateChange: BLOCK* chooseExcessReplicates: 
> (D:50010, blk_-5207804474559026159_121186764) is added to invalidated blocks 
> set
> {noformat}
> Standby pick C, E as excess DNs. Given DNs ignore commands from standby, 
> After the next block reports from C, D, E,  standby has 2 active replicas (A, 
> B), 1 excess replica (C).
> {noformat}
> 2014-03-27 01:51:49,543 INFO BlockStateChange: BLOCK* chooseExcessReplicates: 
> (E:50010, blk_-5207804474559026159_121186764) is added to invalidated blocks 
> set
> 2014-03-27 01:51:49,894 INFO BlockStateChange: BLOCK* chooseExcessReplicates: 
> (C:50010, blk_-5207804474559026159_121186764) is added to invalidated blocks 
> set
> {noformat}
> 2. Machine A decomm request was sent to standby. Standby only had one live 
> replica and picked machine G, H as targets, but given standby commands was 
> ignored by DNs, G, H remained in pending replication queue until they are 
> timed out. At this point, you have one decommissioning replica (A), 1 active 
> replica (B), one excess replica (C).
> {noformat}
> 2014-03-27 04:42:52,258 INFO BlockStateChange: BLOCK* ask A:50010 to 
> replicate blk_-5207804474559026159_121186764 to datanode(s) G:50010 H:50010
> {noformat}
> 3. Machine A decomm request was sent to active NN. Active NN picked machine F 
> as the target. It finished properly. So active NN had 3 active replicas (B, 
> C, F), one decommissioned replica (A).
> {noformat}
> 2014-03-27 04:44:15,239 INFO BlockStateChange: BLOCK* ask 10.42.246.110:50010 
> to replicate blk_-5207804474559026159_121186764 to datanode(s) F:50010
> 2014-03-27 04:44:16,083 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: F:50010 is added to blk_-5207804474559026159_121186764 size 
> 7100065
> {noformat}
> 4. Standby NN picked up F as a new replica. Thus standby had one 
> decommissioning replica (A), 2 active replicas (B, F), one excess replica 
> (C). Standby NN kept trying to schedule replication work, but DNs ignored the 
> commands.
> {noformat}
> 2014-03-27 04:44:16,084 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: F:50010 is added to blk_-5207804474559026159_121186764 size 
> 7100065
> 2014-03-28 23:06:11,970 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Block: 
> blk_-5207804474559026159_121186764, Expected Replicas: 3, live replicas: 2, 
> corrupt replicas: 0, decommissioned replicas: 1, excess replicas: 1, Is Open 
> File: false, Datanodes having this block: C:50010 B:50010 A:50010 F:50010 , 
> Current Datanode: A:50010, Is current datanode decommissioning: true
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to