[
https://issues.apache.org/jira/browse/HDFS-7270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307808#comment-14307808
]
Haohui Mai commented on HDFS-7270:
----------------------------------
bq. Could you please elaborate on how you intend to implement its use in a
followup jira? I'd like to evaluate if your approach will improve or exasperate
current issues in our environment. How will a DN signal congestion? When will
it signal congestion? Ie. In a premature ack since prior ack easily becomes
stale? What will the client do?
To signal congestion, the DN will toggle the ECN flag in the pipeline ack. The
client will back off if it sees the ECN flag.
One scenario we have tested is that (1) DN signals congestion when the system
load is greater than a pre-defined threshold. (e.g., 2 * number of processor),
(2) the client backs off for a fixed amount (e.g., 5s). We found out that with
these changes HDFS can survive from heavy loads in long periods (e.g. loading
several hundred TBs of data into a 7-node cluster in 24h). We're evaluating
using the length of the I/O queues to signal congestion and implementing
exponential back-off in the client.
> Add congestion signaling capability to DataNode write protocol
> --------------------------------------------------------------
>
> Key: HDFS-7270
> URL: https://issues.apache.org/jira/browse/HDFS-7270
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode
> Reporter: Haohui Mai
> Assignee: Haohui Mai
> Attachments: HDFS-7270.000.patch, HDFS-7270.001.patch,
> HDFS-7270.002.patch, HDFS-7270.003.patch, HDFS-7270.004.patch
>
>
> When a client writes to HDFS faster than the disk bandwidth of the DNs, it
> saturates the disk bandwidth and put the DNs unresponsive. The client only
> backs off by aborting / recovering the pipeline, which leads to failed writes
> and unnecessary pipeline recovery.
> This jira proposes to add explicit congestion control mechanisms in the
> writing pipeline.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)