[
https://issues.apache.org/jira/browse/HDFS-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315270#comment-14315270
]
Colin Patrick McCabe commented on HDFS-7608:
--------------------------------------------
Chris, I think you're absolutely right. I vaguely remembered that there was an
alternate method of setting write timeouts we used in places, but I was unable
to find it in a few minutes of digging. The fact that it's passed as a
parameter to {{NetUtils#getOutputStream}} explains why looking for
{{setWriteTimeout}} and similar didn't turn up anything.
However. I still think this is broken, because we will do some writes to the
socket prior to calling {{DFSClient#getDataNodeWriteTimeout}}. For example, in
{{RemoteBlockReader2#newBlockReader}}, we are writing stuff to the socket, all
before ever calling {{DFSClient#getDataNodeWriteTimeout}}.
On a semi-related note, I think that the current configuration situation is
highly confusing and unsatisfactory. We have a configuration key called simply
{{dfs.client.socket-timeout}}, which doesn't specify whether it applies to
reads or writes. I'm not even sure most HDFS developers could answer which
one(s) this key does, if quizzed. Meanwhile, the units are unspecified (is it
seconds? ms?) and the default value doesn't appear in {{DFSConfigKeys.java}},
unlike almost every other configuration key.
How about having {{dfs.client.datanode.socket.read.timeout.ms}} as an alias for
{{dfs.client.socket-timeout}},
{{dfs.client.datanode.socket.write.timeout.ms}} for a base write timeout, and
{{dfs.client.datanode.socket.write.timeout.extra.per.pipeline.node.ms}} to be
an extra amount that we add for each DN in the pipeline?
> hdfs dfsclient newConnectedPeer has no write timeout
> -----------------------------------------------------
>
> Key: HDFS-7608
> URL: https://issues.apache.org/jira/browse/HDFS-7608
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: dfsclient, fuse-dfs
> Affects Versions: 2.3.0, 2.6.0
> Environment: hdfs 2.3.0 hbase 0.98.6
> Reporter: zhangshilong
> Assignee: Xiaoyu Yao
> Labels: patch
> Attachments: HDFS-7608.0.patch, HDFS-7608.1.patch
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> problem:
> hbase compactSplitThread may lock forever on read datanode blocks.
> debug found: epollwait timeout set to 0,so epollwait can not run out.
> cause: in hdfs 2.3.0
> hbase using DFSClient to read and write blocks.
> DFSClient creates one socket using newConnectedPeer(addr), but has no read
> or write timeout.
> in v 2.6.0, newConnectedPeer has added readTimeout to deal with the
> problem,but did not add writeTimeout. why did not add write Timeout?
> I think NioInetPeer need a default socket timeout,so appalications will no
> need to force adding timeout by themselives.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)