Matt Cheah created SPARK-5697:
---------------------------------

             Summary: Allow Spark driver to wait longer before giving up 
connecting to the master
                 Key: SPARK-5697
                 URL: https://issues.apache.org/jira/browse/SPARK-5697
             Project: Spark
          Issue Type: Improvement
          Components: Deploy
    Affects Versions: 1.2.0, 1.1.1
            Reporter: Matt Cheah
             Fix For: 1.4.0


In the AppClient class, the driver is configured to attempt connecting to the 
master 3 times, with 20 second gaps, before giving up and killing the job.

In reality, some clusters may have high amounts of traffic and resource 
contention, and in such environments jobs may wish to wait longer before giving 
up. This reduces the user's overhead of needing to resubmit jobs that simply 
had to wait for too long. An unreliable busy network may also cause messages to 
take a longer time to propagate.

I suggest simply allowing the timeout and the number of retries for driver 
registration to be configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to