Chris Nauroth created MAPREDUCE-5616:
----------------------------------------
Summary: MR Client-AppMaster RPC max retries on socket timeout is
too high.
Key: MAPREDUCE-5616
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5616
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: client
Affects Versions: 2.2.0, 3.0.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
MAPREDUCE-3811 introduced a separate config key for overriding the max retries
applied to RPC connections from the MapReduce Client to the MapReduce
Application Master. This was done to make failover from the AM to the
MapReduce History Server faster in the event that the AM completes while the
client thinks it's still running. However, the RPC client uses a separate
setting for socket timeouts, and this one is not overridden. The default for
this is 45 retries with a 20-second timeout on each retry. This means that in
environments subject to connection timeout instead of connection refused, the
client waits 15 minutes for failover.
--
This message was sent by Atlassian JIRA
(v6.1#6144)