Kay Ousterhout created SPARK-2294:
-------------------------------------
Summary: TaskSchedulerImpl and TaskSetManager do not properly
prioritize which tasks get assigned to an executor
Key: SPARK-2294
URL: https://issues.apache.org/jira/browse/SPARK-2294
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 1.0.0, 1.0.1
Reporter: Kay Ousterhout
If an executor E is free, a task may be speculatively assigned to E when there
are other tasks in the job that have not been launched (at all) yet.
Similarly, a task without any locality preferences may be assigned to E when
there was another NODE_LOCAL task that could have been scheduled.
This happens because TaskSchedulerImpl calls TaskSetManager.resourceOffer
(which in turn calls TaskSetManager.findTask) with increasing locality levels,
beginning with PROCESS_LOCAL, followed by NODE_LOCAL, and so on until the
highest currently allowed level. Now, supposed NODE_LOCAL is the highest
currently allowed locality level. The first time findTask is called, it will
be called with max level PROCESS_LOCAL; if it cannot find any PROCESS_LOCAL
tasks, it will try to schedule tasks with no locality preferences or
speculative tasks. As a result, speculative tasks or tasks with no preferences
may be scheduled instead of NODE_LOCAL tasks.
cc [~matei]
--
This message was sent by Atlassian JIRA
(v6.2#6252)