[
https://issues.apache.org/jira/browse/HDFS-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314868#comment-14314868
]
Chris Nauroth commented on HDFS-7608:
-------------------------------------
Please let me know if I'm missing something, but it appears this patch would
significantly alter the pre-existing write timeout behavior of the HDFS client.
Right now, write timeout is enforced not as a socket option, but instead
enforced per operation by passing the timeout to {{SocketOutputStream}}, which
uses it in the underlying NIO selector calls. The exact write timeout value is
not purely based on configuration. It's also a function of the number of nodes
in the write pipeline. The details are implemented in
{{DFSClient#getDatanodeWriteTimeout}}. Under default configuration, this
method would extend the configured timeout of 60 seconds to 75 seconds
(additional 5 seconds per replica in the pipeline). Extending the timeout
proportional to the pipeline size is meant to make the client robust against
the cumulative latency effects of every write in the pipeline.
This patch would set a 60 second write timeout (under default configuration)
directly as a socket option. I believe that effectively negates the extension
time of up to 75 seconds that {{DFSClient#getDatanodeWriteTimeout}} was trying
to allow.
I see the original problem reported in HDFS-7005 was related to lack of read
timeout. I'm wondering if there is actually no further change required for
write timeout, given the above explanation. Is anyone seeing an actual problem
related to lack of write timeout?
> hdfs dfsclient newConnectedPeer has no write timeout
> -----------------------------------------------------
>
> Key: HDFS-7608
> URL: https://issues.apache.org/jira/browse/HDFS-7608
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: dfsclient, fuse-dfs
> Affects Versions: 2.3.0, 2.6.0
> Environment: hdfs 2.3.0 hbase 0.98.6
> Reporter: zhangshilong
> Assignee: Xiaoyu Yao
> Labels: patch
> Attachments: HDFS-7608.0.patch, HDFS-7608.1.patch
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> problem:
> hbase compactSplitThread may lock forever on read datanode blocks.
> debug found: epollwait timeout set to 0,so epollwait can not run out.
> cause: in hdfs 2.3.0
> hbase using DFSClient to read and write blocks.
> DFSClient creates one socket using newConnectedPeer(addr), but has no read
> or write timeout.
> in v 2.6.0, newConnectedPeer has added readTimeout to deal with the
> problem,but did not add writeTimeout. why did not add write Timeout?
> I think NioInetPeer need a default socket timeout,so appalications will no
> need to force adding timeout by themselives.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)