[
https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yuanxin Zhu updated HDFS-16293:
-------------------------------
Description:
When I open the ECN and use Terasort for testing, DataNodes are
congested(HDFS-8008). The client enters the sleep state after receiving the ACK
for many times, but does not release the 'dataQueue'. The ResponseProcessor
thread needs the 'dataQueue' to execute 'ackQueue.getFirst()', so the
ResponseProcessor will wait for the client to release the 'dataQueue', which is
equivalent to that the ResponseProcessor thread also enters sleep, resulting in
ACK delay.MapReduce tasks can be delayed by tens of minutes or even hours.
The DataStreamer thread can first execute 'one = dataQueue. getFirst()',
release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()'
according to 'one.isHeartbeatPacket()'
was:
When I open the ECN and use Terasort for testing, datanodes are
congested([HDFS-8008|https://issues.apache.org/jira/browse/HDFS-8008]). The
client enters the sleep state after receiving the ACK for many times, but does
not release the 'dataqueue'. The ResponseProcessor thread needs the 'dataqueue'
to execute 'ackqueue. getfirst()', so the ResponseProcessor will wait for the
client to release the 'dataqueue', which is equivalent to that the
ResponseProcessor thread also enters sleep, resulting in ack delay.MapReduce
tasks can be delayed by tens of minutes or even hours
> Client sleep and hold 'dataqueue' when datanode are condensed
> -------------------------------------------------------------
>
> Key: HDFS-16293
> URL: https://issues.apache.org/jira/browse/HDFS-16293
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs-client
> Affects Versions: 3.2.2
> Reporter: Yuanxin Zhu
> Priority: Major
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> When I open the ECN and use Terasort for testing, DataNodes are
> congested(HDFS-8008). The client enters the sleep state after receiving the
> ACK for many times, but does not release the 'dataQueue'. The
> ResponseProcessor thread needs the 'dataQueue' to execute
> 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to
> release the 'dataQueue', which is equivalent to that the ResponseProcessor
> thread also enters sleep, resulting in ACK delay.MapReduce tasks can be
> delayed by tens of minutes or even hours.
> The DataStreamer thread can first execute 'one = dataQueue. getFirst()',
> release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()'
> according to 'one.isHeartbeatPacket()'
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]