Kevin Wikant created HDFS-17649:
-----------------------------------
Summary: Improve HDFS DataStreamer client to handle datanode
decommissioning
Key: HDFS-17649
URL: https://issues.apache.org/jira/browse/HDFS-17649
Project: Hadoop HDFS
Issue Type: Improvement
Affects Versions: 3.4.0
Environment: Tested on Hadoop 3.4.0
I think the limitation still exists on the trunk though
Reporter: Kevin Wikant
The HDFS DataStreamer client can handle single datanode failures by
failing-over to other datanodes in the block write pipeline.
However, if "dfs.replication=1" & one datanode in the block write pipeline are
decommissioned, then the HDFS DataStreamer client will not failover to the new
datanode holding the block replica.
If "dfs.replication>1" then the decommissioned datanode(s) will be removed from
the block write pipeline & new replacement datanode(s) will be requested from
the Namenode.
However, if "dfs.replication=1" then a new replacement datanode will never be
requested from the Namenode. This is counter-intuitive because the block was
successfully replicated to another datanode as part of decommissioning & that
datanode could be returned by the Namenode to the DataStreamer to enable
additional append operations to be successful.
Relevant code:
*
[https://github.com/apache/hadoop/blob/7a7b346b0ab60de792ca90dede9ff369fb50d63a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L1723]
*
[https://github.com/apache/hadoop/blob/7a7b346b0ab60de792ca90dede9ff369fb50d63a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L1645]
*
[https://github.com/apache/hadoop/blob/7a7b346b0ab60de792ca90dede9ff369fb50d63a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L1648]
Repro Steps:
# Create an HDFS cluster with "dfs.replication=1"
# Create a DataStreamer client & write a file to HDFS
# Identify what datanode the block was written to
# Decommission that datanode & confirm the block was replicated to another
datanode where it is still accessible
# Attempt to write the DataStreamer client again & observe it will always fail
with:
{quote}All datanodes [DatanodeInfoWithStorage[XYZ]] are bad
{quote}
Suggestion:
* it seems to me this assumption in DataStreamer client is based on the block
being lost because the datanode went "bad" unexpectedly
* however, this assumption is not true when the datanode is gracefully
decommissioned & the block is replicated to another datanode
* I think the DataStreamer client could be updated to request a replacement
datanode even when "dfs.replication=1" and rely on Namenode as source-of-truth
for if the block replica is still available somewhere (with the required
generation stamp, etc...)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]