Wilfred Spiegelenburg created HADOOP-11252:
----------------------------------------------
Summary: RPC client write does not time out by default
Key: HADOOP-11252
URL: https://issues.apache.org/jira/browse/HADOOP-11252
Project: Hadoop Common
Issue Type: Bug
Components: ipc
Affects Versions: 2.5.0
Reporter: Wilfred Spiegelenburg
The RPC client has a default timeout set to 0 when no timeout is passed in.
This means that the network connection created will not timeout when used to
write data. The issue has shown in YARN-2578 and HDFS-4858. Timeouts for writes
then fall back to the tcp level retry (configured via tcp_retries2) and
timeouts between the 15-30 minutes. Which is too long for a default behaviour.
Using 0 as the default value for timeout is incorrect. We should use a sane
value for the timeout and the "ipc.ping.interval" configuration value is a
logical choice for it. The default behaviour should be changed from 0 to the
value read for the ping interval from the Configuration.
Fixing it in common makes more sense than finding and changing all other points
in the code that do not pass in a timeout.
Offending code lines:
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/RPC.java#L488
and
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/RPC.java#L350
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)