Shangshu Qian created HDFS-17837: ------------------------------------ Summary: Potential Feedback Loop in Write Pipeline Key: HDFS-17837 URL: https://issues.apache.org/jira/browse/HDFS-17837 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.10.2 Reporter: Shangshu Qian
We find that a delay in the pipeline recovery operations may cause block recovery to fail, resulting in workload amplification. Pipeline rebuild cause extra workload to the DNs in the cluster. -> The pipeline rebuild can cause contention with the block recovery operations, which is also an inter-datanode operation. -> The failed block recovery may cause extra retries, making the DN load higher. -> The IBR reporting in the heartbeat is delayed due to IOE caused by congestion. -> The write pipeline fails because the IBR is delayed, causing more pipeline rebuild. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org