Siddharth Seth created TEZ-1750:
-----------------------------------

             Summary: Add a DAGScheduler which schedules tasks only when 
sources have been scheduled
                 Key: TEZ-1750
                 URL: https://issues.apache.org/jira/browse/TEZ-1750
             Project: Apache Tez
          Issue Type: Improvement
            Reporter: Siddharth Seth
            Assignee: Siddharth Seth
            Priority: Critical


Splitting out the patch on TEZ-1522 into a separate jira.

There's several scenarios in which we end up scheduling downstream tasks before 
their sources have been scheduled - and then get into a situation where the 
sources are starved. Currently, anywhere a ShuffleVertexManager is used can 
cause such behaviour - since it starts scheduling it's tasks after a certain 
number of sources are complete, but subsequen non-shuffle VertexManagers will 
scheduled immediately.
Disabling slow-start is one option to achieve this (or setting slow start on 
all vertices), but it doesn't work for the situation where dynamic reducer 
parallelism kicks in - since it has to wait for source tasks to complete.

The intent here is to add a DAGScheduler, which affectively negates the slow 
start, and in case of dynamic parallelism determination, waits for upstream 
tasks to be scheduled before scheduling downstream tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to