[jira] [Commented] (HDFS-7608) hdfs dfsclient newConnectedPeer has no write timeout

Colin Patrick McCabe (JIRA) Tue, 10 Feb 2015 16:32:24 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315270#comment-14315270
 ]


Colin Patrick McCabe commented on HDFS-7608:
--------------------------------------------

Chris, I think you're absolutely right.  I vaguely remembered that there was an 
alternate method of setting write timeouts we used in places, but I was unable 
to find it in a few minutes of digging.  The fact that it's passed as a 
parameter to {{NetUtils#getOutputStream}} explains why looking for 
{{setWriteTimeout}} and similar didn't turn up anything.

However.  I still think this is broken, because we will do some writes to the 
socket prior to calling {{DFSClient#getDataNodeWriteTimeout}}.  For example, in 
{{RemoteBlockReader2#newBlockReader}}, we are writing stuff to the socket, all 
before ever calling {{DFSClient#getDataNodeWriteTimeout}}.

On a semi-related note, I think that the current configuration situation is 
highly confusing and unsatisfactory.  We have a configuration key called simply 
{{dfs.client.socket-timeout}}, which doesn't specify whether it applies to 
reads or writes.  I'm not even sure most HDFS developers could answer which 
one(s) this key does, if quizzed.  Meanwhile, the units are unspecified (is it 
seconds?  ms?) and the default value doesn't appear in {{DFSConfigKeys.java}}, 
unlike almost every other configuration key.

How about having {{dfs.client.datanode.socket.read.timeout.ms}} as an alias for 
{{dfs.client.socket-timeout}}, 
{{dfs.client.datanode.socket.write.timeout.ms}} for a base write timeout, and 
{{dfs.client.datanode.socket.write.timeout.extra.per.pipeline.node.ms}} to be 
an extra amount that we add for each DN in the pipeline?

> hdfs dfsclient  newConnectedPeer has no write timeout
> -----------------------------------------------------
>
>                 Key: HDFS-7608
>                 URL: https://issues.apache.org/jira/browse/HDFS-7608
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: dfsclient, fuse-dfs
>    Affects Versions: 2.3.0, 2.6.0
>         Environment: hdfs 2.3.0  hbase 0.98.6
>            Reporter: zhangshilong
>            Assignee: Xiaoyu Yao
>              Labels: patch
>         Attachments: HDFS-7608.0.patch, HDFS-7608.1.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> problem:
> hbase compactSplitThread may lock forever on  read datanode blocks.
> debug found:  epollwait timeout set to 0,so epollwait can not  run out.
> cause: in hdfs 2.3.0
> hbase using DFSClient to read and write blocks.
> DFSClient  creates one socket using newConnectedPeer(addr), but has no read 
> or write timeout. 
> in v 2.6.0,  newConnectedPeer has added readTimeout to deal with the 
> problem,but did not add writeTimeout. why did not add write Timeout?
> I think NioInetPeer need a default socket timeout,so appalications will no 
> need to force adding timeout by themselives. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7608) hdfs dfsclient newConnectedPeer has no write timeout

Reply via email to