Jie Huang created SPARK-9193:
--------------------------------
Summary: Avoid assigning tasks to executors under killing
Key: SPARK-9193
URL: https://issues.apache.org/jira/browse/SPARK-9193
Project: Spark
Issue Type: Bug
Components: Scheduler
Affects Versions: 1.4.1, 1.4.0
Reporter: Jie Huang
Now, when some executors are killed by dynamic-allocation, it leads to some
mis-assignment onto lost executors sometimes. Such kind of mis-assignment
causes task failure(s) or even job failure if it repeats that errors for 4
times.
The root cause is that killExecutors doesn't remove those executors under
killing ASAP. It depends on the OnDisassociated event to refresh the active
working list later. The delay time really depends on your cluster status (from
several milliseconds to sub-minute). When new tasks to be scheduled during that
period of time, it will be assigned to those "active" but "under killing"
executors. Then the tasks will be failed due to "executor lost". The better way
is to exclude those executors under killing in the makeOffers(). Then all those
tasks won't be allocated onto those executors "to be lost" any more.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]