Patrick Woody created SPARK-8167:
------------------------------------
Summary: Tasks that fail due to YARN preemption can cause job
failure
Key: SPARK-8167
URL: https://issues.apache.org/jira/browse/SPARK-8167
Project: Spark
Issue Type: Bug
Components: Scheduler, YARN
Affects Versions: 1.3.1
Reporter: Patrick Woody
Tasks that are running on preempted executors will count as FAILED with an
ExecutorLostFailure. Unfortunately, this can quickly spiral out of control if a
large resource shift is occurring, and the tasks get scheduled to executors
that immediately get preempted as well.
The current workaround is to increase spark.task.maxFailures very high, but
that can cause delays in true failures. We should ideally differentiate these
task statuses so that they don't count towards the failure limit.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]