[ 
https://issues.apache.org/jira/browse/SPARK-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14054862#comment-14054862
 ] 

Rui Li commented on SPARK-2387:
-------------------------------

PR created at
https://github.com/apache/spark/pull/1328

> Remove the stage barrier for better resource utilization
> --------------------------------------------------------
>
>                 Key: SPARK-2387
>                 URL: https://issues.apache.org/jira/browse/SPARK-2387
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>            Reporter: Rui Li
>
> DAGScheduler divides a Spark job into multiple stages according to RDD 
> dependencies. Whenever there’s a shuffle dependency, DAGScheduler creates a 
> shuffle map stage on the map side, and another stage depending on that stage.
> Currently, the downstream stage cannot start until all its depended stages 
> have finished. This barrier between stages leads to idle slots when waiting 
> for the last few upstream tasks to finish and thus wasting cluster resources.
> Therefore we propose to remove the barrier and pre-start the reduce stage 
> once there're free slots. This can achieve better resource utilization and 
> improve the overall job performance, especially when there're lots of 
> executors granted to the application.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to