[
https://issues.apache.org/jira/browse/SPARK-20219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
jin xing updated SPARK-20219:
-----------------------------
Attachment: screenshot-1.png
> Schedule tasks based on size of input from ScheduledRDD
> -------------------------------------------------------
>
> Key: SPARK-20219
> URL: https://issues.apache.org/jira/browse/SPARK-20219
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 2.1.0
> Reporter: jin xing
> Attachments: screenshot-1.png
>
>
> When data is highly skewed on ShuffledRDD, it make sense to launch those
> tasks which process much more input as soon as possible. The current
> scheduling mechanism in *TaskSetManager* is quite simple:
> {code}
> for (i <- (0 until numTasks).reverse) {
> addPendingTask(i)
> }
> {code}
> In scenario that "large tasks" locate at bottom half of tasks array, if tasks
> with much more input are launched early, we can significantly reduce the time
> cost and save resource when *"dynamic allocation"* is disabled.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]