Matt Cheah created SPARK-5697:
---------------------------------
Summary: Allow Spark driver to wait longer before giving up
connecting to the master
Key: SPARK-5697
URL: https://issues.apache.org/jira/browse/SPARK-5697
Project: Spark
Issue Type: Improvement
Components: Deploy
Affects Versions: 1.2.0, 1.1.1
Reporter: Matt Cheah
Fix For: 1.4.0
In the AppClient class, the driver is configured to attempt connecting to the
master 3 times, with 20 second gaps, before giving up and killing the job.
In reality, some clusters may have high amounts of traffic and resource
contention, and in such environments jobs may wish to wait longer before giving
up. This reduces the user's overhead of needing to resubmit jobs that simply
had to wait for too long. An unreliable busy network may also cause messages to
take a longer time to propagate.
I suggest simply allowing the timeout and the number of retries for driver
registration to be configurable.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]