[
https://issues.apache.org/jira/browse/SPARK-31418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-31418:
------------------------------------
Assignee: Apache Spark
> Blacklisting feature aborts Spark job without retrying for max num retries in
> case of Dynamic allocation
> --------------------------------------------------------------------------------------------------------
>
> Key: SPARK-31418
> URL: https://issues.apache.org/jira/browse/SPARK-31418
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 2.3.0, 2.4.5
> Reporter: Venkata krishnan Sowrirajan
> Assignee: Apache Spark
> Priority: Major
>
> With Spark blacklisting, if a task fails on an executor, the executor gets
> blacklisted for the task. In order to retry the task, it checks if there are
> idle blacklisted executor which can be killed and replaced to retry the task
> if not it aborts the job without doing max retries.
> In the context of dynamic allocation this can be better, instead of killing
> the blacklisted idle executor (its possible there are no idle blacklisted
> executor), request an additional executor and retry the task.
> This can be easily reproduced with a simple job like below, although this
> example should fail eventually just to show that its not retried
> spark.task.maxFailures times:
> {code:java}
> def test(a: Int) = { a.asInstanceOf[String] }
> sc.parallelize(1 to 10, 10).map(x => test(x)).collect
> {code}
> with dynamic allocation enabled and min executors set to 1. But there are
> various other cases where this can fail as well.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]