Yogesh Natarajan created SPARK-24075:
----------------------------------------
Summary: [Mesos] Supervised driver upon failure will be retried
indefinitely unless explicitly killed
Key: SPARK-24075
URL: https://issues.apache.org/jira/browse/SPARK-24075
Project: Spark
Issue Type: Improvement
Components: Mesos
Affects Versions: 2.3.0
Reporter: Yogesh Natarajan
If supervise is enabled, MesosClusterScheduler will retry a failing driver
indefinitely. This takes up cluster resources which is freed up only when the
driver is explicitly killed.
The proposed solution is to introduce spark configuration
"spark.driver.supervise.maxRetries" which allows the maximum number of retries
to be specified while preserving the default behavior of retrying the driver
indefinitely.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]