Kevin Wikant created HDFS-17649: ----------------------------------- Summary: Improve HDFS DataStreamer client to handle datanode decommissioning Key: HDFS-17649 URL: https://issues.apache.org/jira/browse/HDFS-17649 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.4.0 Environment: Tested on Hadoop 3.4.0
I think the limitation still exists on the trunk though Reporter: Kevin Wikant The HDFS DataStreamer client can handle single datanode failures by failing-over to other datanodes in the block write pipeline. However, if "dfs.replication=1" & one datanode in the block write pipeline are decommissioned, then the HDFS DataStreamer client will not failover to the new datanode holding the block replica. If "dfs.replication>1" then the decommissioned datanode(s) will be removed from the block write pipeline & new replacement datanode(s) will be requested from the Namenode. However, if "dfs.replication=1" then a new replacement datanode will never be requested from the Namenode. This is counter-intuitive because the block was successfully replicated to another datanode as part of decommissioning & that datanode could be returned by the Namenode to the DataStreamer to enable additional append operations to be successful. Relevant code: * [https://github.com/apache/hadoop/blob/7a7b346b0ab60de792ca90dede9ff369fb50d63a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L1723] * [https://github.com/apache/hadoop/blob/7a7b346b0ab60de792ca90dede9ff369fb50d63a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L1645] * [https://github.com/apache/hadoop/blob/7a7b346b0ab60de792ca90dede9ff369fb50d63a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L1648] Repro Steps: # Create an HDFS cluster with "dfs.replication=1" # Create a DataStreamer client & write a file to HDFS # Identify what datanode the block was written to # Decommission that datanode & confirm the block was replicated to another datanode where it is still accessible # Attempt to write the DataStreamer client again & observe it will always fail with: {quote}All datanodes [DatanodeInfoWithStorage[XYZ]] are bad {quote} Suggestion: * it seems to me this assumption in DataStreamer client is based on the block being lost because the datanode went "bad" unexpectedly * however, this assumption is not true when the datanode is gracefully decommissioned & the block is replicated to another datanode * I think the DataStreamer client could be updated to request a replacement datanode even when "dfs.replication=1" and rely on Namenode as source-of-truth for if the block replica is still available somewhere (with the required generation stamp, etc...) -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org