[
https://issues.apache.org/jira/browse/HDFS-16292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448595#comment-17448595
]
tomscut edited comment on HDFS-16292 at 11/25/21, 12:33 AM:
------------------------------------------------------------
Finally, by fetching *strace* on the client, we can see that the timeout
parameter is indeed set successfully.
!image-2021-11-24-21-40-23-793.png|width=605,height=57!
The problem we encountered was that the pressure at the DataNode was high and
the lock competition was fierce (we had not merged these patch
HDFS-13359,[HDFS-15150|
https://issues.apache.org/jira/browse/HDFS-15150], [HDFS-15160|
https://issues.apache.org/jira/browse/HDFS-15160]yet), resulting in the
extremely slow {*}epoll_wait{*}, which reached the level of seconds, so the
DataNode response was very slow.
!image-2021-11-24-21-04-26-568.png|width=700,height=302!
In addition, spark sets a very small buffer size (512 bytes) to read ORC files,
resulting in slow file reading. Makes people think it's infinite waiting.
!image-2021-11-24-21-06-49-172.png|width=701,height=127!
was (Author: tomscut):
Finally, by fetching *strace* on the client, we can see that the timeout
parameter is indeed set successfully.
!image-2021-11-24-21-40-23-793.png|width=605,height=57!
The problem we encountered was that the pressure at the DataNode was high and
the lock competition was fierce (we had not merged this patch HDFS-13359 yet),
resulting in the extremely slow {*}epoll_wait{*}, which reached the level of
seconds, so the DataNode response was very slow.
!image-2021-11-24-21-04-26-568.png|width=700,height=302!
In addition, spark sets a very small buffer size (512 bytes) to read ORC files,
resulting in slow file reading. Makes people think it's infinite waiting.
!image-2021-11-24-21-06-49-172.png|width=701,height=127!
> The DFS Input Stream is waiting to be read
> ------------------------------------------
>
> Key: HDFS-16292
> URL: https://issues.apache.org/jira/browse/HDFS-16292
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode
> Affects Versions: 2.5.2
> Reporter: Hualong Zhang
> Priority: Minor
> Attachments: HDFS-16292.path, image-2021-11-01-18-36-54-329.png,
> image-2021-11-02-08-54-27-273.png, image-2021-11-24-21-04-26-568.png,
> image-2021-11-24-21-06-26-064.png, image-2021-11-24-21-06-49-172.png,
> image-2021-11-24-21-40-23-793.png
>
>
> The input stream has been waiting.The problem seems to be that
> BlockReaderPeer#peer does not set ReadTimeout and WriteTimeout.We can solve
> this problem by setting the timeout in BlockReaderFactory#nextTcpPeer
> Jstack as follows
> !image-2021-11-01-18-36-54-329.png!
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]