[ 
https://issues.apache.org/jira/browse/HDFS-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314868#comment-14314868
 ] 

Chris Nauroth commented on HDFS-7608:
-------------------------------------

Please let me know if I'm missing something, but it appears this patch would 
significantly alter the pre-existing write timeout behavior of the HDFS client.

Right now, write timeout is enforced not as a socket option, but instead 
enforced per operation by passing the timeout to {{SocketOutputStream}}, which 
uses it in the underlying NIO selector calls.  The exact write timeout value is 
not purely based on configuration.  It's also a function of the number of nodes 
in the write pipeline.  The details are implemented in 
{{DFSClient#getDatanodeWriteTimeout}}.  Under default configuration, this 
method would extend the configured timeout of 60 seconds to 75 seconds 
(additional 5 seconds per replica in the pipeline).  Extending the timeout 
proportional to the pipeline size is meant to make the client robust against 
the cumulative latency effects of every write in the pipeline.

This patch would set a 60 second write timeout (under default configuration) 
directly as a socket option.  I believe that effectively negates the extension 
time of up to 75 seconds that {{DFSClient#getDatanodeWriteTimeout}} was trying 
to allow.

I see the original problem reported in HDFS-7005 was related to lack of read 
timeout.  I'm wondering if there is actually no further change required for 
write timeout, given the above explanation.  Is anyone seeing an actual problem 
related to lack of write timeout?

> hdfs dfsclient  newConnectedPeer has no write timeout
> -----------------------------------------------------
>
>                 Key: HDFS-7608
>                 URL: https://issues.apache.org/jira/browse/HDFS-7608
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: dfsclient, fuse-dfs
>    Affects Versions: 2.3.0, 2.6.0
>         Environment: hdfs 2.3.0  hbase 0.98.6
>            Reporter: zhangshilong
>            Assignee: Xiaoyu Yao
>              Labels: patch
>         Attachments: HDFS-7608.0.patch, HDFS-7608.1.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> problem:
> hbase compactSplitThread may lock forever on  read datanode blocks.
> debug found:  epollwait timeout set to 0,so epollwait can not  run out.
> cause: in hdfs 2.3.0
> hbase using DFSClient to read and write blocks.
> DFSClient  creates one socket using newConnectedPeer(addr), but has no read 
> or write timeout. 
> in v 2.6.0,  newConnectedPeer has added readTimeout to deal with the 
> problem,but did not add writeTimeout. why did not add write Timeout?
> I think NioInetPeer need a default socket timeout,so appalications will no 
> need to force adding timeout by themselives. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to