Chris Nauroth created MAPREDUCE-5616:
----------------------------------------

             Summary: MR Client-AppMaster RPC max retries on socket timeout is 
too high.
                 Key: MAPREDUCE-5616
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5616
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: client
    Affects Versions: 2.2.0, 3.0.0
            Reporter: Chris Nauroth
            Assignee: Chris Nauroth


MAPREDUCE-3811 introduced a separate config key for overriding the max retries 
applied to RPC connections from the MapReduce Client to the MapReduce 
Application Master.  This was done to make failover from the AM to the 
MapReduce History Server faster in the event that the AM completes while the 
client thinks it's still running.  However, the RPC client uses a separate 
setting for socket timeouts, and this one is not overridden.  The default for 
this is 45 retries with a 20-second timeout on each retry.  This means that in 
environments subject to connection timeout instead of connection refused, the 
client waits 15 minutes for failover.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to