Kevin Wikant created HDFS-17649:
-----------------------------------

             Summary: Improve HDFS DataStreamer client to handle datanode 
decommissioning
                 Key: HDFS-17649
                 URL: https://issues.apache.org/jira/browse/HDFS-17649
             Project: Hadoop HDFS
          Issue Type: Improvement
    Affects Versions: 3.4.0
         Environment: Tested on Hadoop 3.4.0

I think the limitation still exists on the trunk though
            Reporter: Kevin Wikant


The HDFS DataStreamer client can handle single datanode failures by 
failing-over to other datanodes in the block write pipeline.

However, if "dfs.replication=1" & one datanode in the block write pipeline are 
decommissioned, then the HDFS DataStreamer client will not failover to the new 
datanode holding the block replica.

If "dfs.replication>1" then the decommissioned datanode(s) will be removed from 
the block write pipeline & new replacement datanode(s) will be requested from 
the Namenode.

However, if "dfs.replication=1" then a new replacement datanode will never be 
requested from the Namenode. This is counter-intuitive because the block was 
successfully replicated to another datanode as part of decommissioning & that 
datanode could be returned by the Namenode to the DataStreamer to enable 
additional append operations to be successful.

Relevant code:
 * 
[https://github.com/apache/hadoop/blob/7a7b346b0ab60de792ca90dede9ff369fb50d63a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L1723]
 * 
[https://github.com/apache/hadoop/blob/7a7b346b0ab60de792ca90dede9ff369fb50d63a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L1645]
 * 
[https://github.com/apache/hadoop/blob/7a7b346b0ab60de792ca90dede9ff369fb50d63a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L1648]
 

Repro Steps:
 # Create an HDFS cluster with "dfs.replication=1"
 # Create a DataStreamer client & write a file to HDFS
 # Identify what datanode the block was written to
 # Decommission that datanode & confirm the block was replicated to another 
datanode where it is still accessible
 # Attempt to write the DataStreamer client again & observe it will always fail 
with:

{quote}All datanodes [DatanodeInfoWithStorage[XYZ]] are bad
{quote}
Suggestion:
 * it seems to me this assumption in DataStreamer client is based on the block 
being lost because the datanode went "bad" unexpectedly
 * however, this assumption is not true when the datanode is gracefully 
decommissioned & the block is replicated to another datanode
 * I think the DataStreamer client could be updated to request a replacement 
datanode even when "dfs.replication=1" and rely on Namenode as source-of-truth 
for if the block replica is still available somewhere (with the required 
generation stamp, etc...)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to