Shangshu Qian created HDFS-17837:
------------------------------------

             Summary: Potential Feedback Loop in Write Pipeline
                 Key: HDFS-17837
                 URL: https://issues.apache.org/jira/browse/HDFS-17837
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: datanode
    Affects Versions: 2.10.2
            Reporter: Shangshu Qian


We find that a delay in the pipeline recovery operations may cause block 
recovery to fail, resulting in workload amplification.

Pipeline rebuild cause extra workload to the DNs in the cluster.
-> The pipeline rebuild can cause contention with the block recovery 
operations, which is also an inter-datanode operation.
-> The failed block recovery may cause extra retries, making the DN load higher.
-> The IBR reporting in the heartbeat is delayed due to IOE caused by 
congestion.
-> The write pipeline fails because the IBR is delayed, causing more pipeline 
rebuild.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to