Kevin Wikant created HDFS-16064:
-----------------------------------
Summary: HDFS-721 causes DataNode decommissioning to get stuck
indefinitely
Key: HDFS-16064
URL: https://issues.apache.org/jira/browse/HDFS-16064
Project: Hadoop HDFS
Issue Type: Bug
Components: datanode, namenode
Affects Versions: 3.2.1
Reporter: Kevin Wikant
Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a
non-issue under the assumption that if the namenode & a datanode get into an
inconsistent state for a given block pipeline, there should be another datanode
available to replicate the block to
While testing datanode decommissioning using "dfs.exclude.hosts", I have
encountered a scenario where the decommissioning gets stuck indefinitely
Below is the progression of events:
* there are initially 4 datanodes DN1, DN2, DN3, DN4
* scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts"
* HDFS block(s) on DN1 & DN2 must now be replicated to DN3 & DN4 in order to
satisfy their minimum replication factor of 2
* during this replication process
https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes the
following inconsistent state:
** DN3 thinks it has the block pipeline in FINALIZED state
** the namenode does not think DN3 has the block pipeline
{code:java}
2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode
(DataXceiver for client at /DN2:45654 [Receiving block BP-YYY:blk_XXX]):
DN3:9866:DataXceiver error processing WRITE_BLOCK operation src: /DN2:45654
dst: /DN3:9866;
org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block
BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created.
{code}
* the replication is attempted again, but:
** DN4 has the block
** DN1 and/or DN2 have the block, but don't count towards the minimum
replication factor because they are being decommissioned
** DN3 does not have the block & cannot have the block replicated to it
because of HDFS-721
* the namenode repeatedly tries to replicate the block to DN3 & repeatedly
fails, this continues indefinitely
* therefore DN4 is the only live datanode with the block & the minimum
replication factor of 2 cannot be satisfied
* because the minimum replication factor cannot be satisfied for the block(s)
being moved off DN1 & DN2, the datanode decommissioning can never be completed
{code:java}
2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): Block:
blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0,
decommissioned replicas: 0, decommissioning replicas: 2, maintenance replicas:
0, live entering maintenance replicas: 0, excess replicas: 0, Is Open File:
false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , Current
Datanode: DN1:9866, Is current datanode decommissioning: true, Is current
datanode entering maintenance: false
...
2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): Block:
blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0,
decommissioned replicas: 0, decommissioning replicas: 2, maintenance replicas:
0, live entering maintenance replicas: 0, excess replicas: 0, Is Open File:
false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , Current
Datanode: DN2:9866, Is current datanode decommissioning: true, Is current
datanode entering maintenance: false
{code}
Being stuck in decommissioning state forever is not an intended behavior of
DataNode decommissioning
A few potential solutions:
* Address the root cause of the problem which is an inconsistent state between
namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721
* Detect when datanode decommissioning is stuck due to lack of available
datanodes for satisfying the minimum replication factor, then recover by
re-enabling the datanodes being decommissioned
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]