[
https://issues.apache.org/jira/browse/SPARK-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-1937:
---------------------------------
Assignee: Rui Li
> Tasks can be submitted before executors are registered
> ------------------------------------------------------
>
> Key: SPARK-1937
> URL: https://issues.apache.org/jira/browse/SPARK-1937
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.0.0
> Reporter: Rui Li
> Assignee: Rui Li
> Fix For: 1.1.0
>
> Attachments: After-patch.PNG, Before-patch.png, RSBTest.scala
>
>
> During construction, TaskSetManager will assign tasks to several pending
> lists according to the tasks’ preferred locations. If the desired location is
> unavailable, it’ll then assign this task to “pendingTasksWithNoPrefs”, a list
> containing tasks without preferred locations.
> The problem is that tasks may be submitted before the executors get
> registered with the driver, in which case TaskSetManager will assign all the
> tasks to pendingTasksWithNoPrefs. Later when it looks for a task to schedule,
> it will pick one from this list and assign it to arbitrary executor, since
> TaskSetManager considers the tasks can run equally well on any node.
> This problem deprives benefits of data locality, drags the whole job slow and
> can cause imbalance between executors.
> I ran into this issue when running a spark program on a 7-node cluster
> (node6~node12). The program processes 100GB data.
> Since the data is uploaded to HDFS from node6, this node has a complete copy
> of the data and as a result, node6 finishes tasks much faster, which in turn
> makes it complete dis-proportionally more tasks than other nodes.
> To solve this issue, I think we shouldn't check availability of
> executors/hosts when constructing TaskSetManager. If a task prefers a node,
> we simply add the task to that node’s pending list. When later on the node is
> added, TaskSetManager can schedule the task according to proper locality
> level. If unfortunately the preferred node(s) never gets added,
> TaskSetManager can still schedule the task at locality level “ANY”.
--
This message was sent by Atlassian JIRA
(v6.2#6252)