[ 
https://issues.apache.org/jira/browse/TEZ-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202405#comment-14202405
 ] 

Siddharth Seth commented on TEZ-1750:
-------------------------------------

bq. typo -
Fixed.

bq. If pending event are first put into the queue then we can avoid extra code 
to handle it if scheduling is triggered.
Are you referring to the two invocations - 
sendEventsForVertex(vertex.getName()) and  sendEvent(attemptEvent); ? There was 
a comment in the earlier patch about the pending list being empty if the vertex 
has already been scheduled. It's also skipping the lookup of the list from the 
pendingEvents map.

bq. 0 task vertices would never have any scheduling requests for tasks. they 
immediately move to succeeded state. This should in fact be a precondition. If 
needed, it could check for 0 task vertices by traversing the graph initially or 
by listening for vertex status updates.
They won't have schedule requests. However, to fit in with the rest of the 
code, an entry is required in the relevant maps, which is what this snippet 
does. The first time we see such vertices - we move them into completed state - 
instead of pre-traversing the graph.

bq. Is this susceptible to counting errors because of attempt retries because 
failures/speculation etc?
Not in terms of counting scheduled tasks - since it counts tasks rather than 
attempts. There's a unit test for that. However, it cannot deal with attempts 
which may get killed after the schedule request goes out.

bq. Should probably rename this to WithThrottling rather than V2.
Will see if I can come up with a different name. Maybe WithThrottling works - 
want to make sure the name is not restrictive for future additions like slow 
start. I was actually thinking of renaming DAGSchedulerNaturalOrder to 
DAGSchedulerNaturalOrderSImple and just calling this one 
DAGSchedulerNaturalOrder.

> Add a DAGScheduler which schedules tasks only when sources have been scheduled
> ------------------------------------------------------------------------------
>
>                 Key: TEZ-1750
>                 URL: https://issues.apache.org/jira/browse/TEZ-1750
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>            Priority: Critical
>         Attachments: TEZ-1750.1.txt
>
>
> Splitting out the patch on TEZ-1522 into a separate jira.
> There's several scenarios in which we end up scheduling downstream tasks 
> before their sources have been scheduled - and then get into a situation 
> where the sources are starved. Currently, anywhere a ShuffleVertexManager is 
> used can cause such behaviour - since it starts scheduling it's tasks after a 
> certain number of sources are complete, but subsequen non-shuffle 
> VertexManagers will scheduled immediately.
> Disabling slow-start is one option to achieve this (or setting slow start on 
> all vertices), but it doesn't work for the situation where dynamic reducer 
> parallelism kicks in - since it has to wait for source tasks to complete.
> The intent here is to add a DAGScheduler, which affectively negates the slow 
> start, and in case of dynamic parallelism determination, waits for upstream 
> tasks to be scheduled before scheduling downstream tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to