[ 
https://issues.apache.org/jira/browse/TEZ-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202327#comment-14202327
 ] 

Bikas Saha commented on TEZ-1750:
---------------------------------

typo - 
{code}
of the type od edges
// Tacks vertices f
// A new taks coming
{code}

If pending event are first put into the queue then we can avoid extra code to 
handle it if scheduling is triggered.
{code}+      boolean scheduled = trySchedulingVertex(vertex);
+      if (scheduled) {
+        LOG.info("Scheduled vertex: " + vertex.getLogIdentifier());
+        // If ready to be scheduled, send out pending events and the current 
event.
+        // Send events out for this vertex first. Then try scheduling 
downstream vertices.
+        sendEventsForVertex(vertex.getName());
+        sendEvent(attemptEvent);
+        if (LOG.isDebugEnabled()) {
+          LOG.debug("Processing downstream vertices for vertex: " + 
vertex.getLogIdentifier());
+        }
+        processDownstreamVertices(vertex);
+      } else {
+        pendingEvents.put(vertex.getName(), attemptEvent);{code}

0 task vertices would never have any scheduling requests for tasks. they 
immediately move to succeeded state. This should in fact be a precondition. If 
needed, it could check for 0 task vertices by traversing the graph initially or 
by listening for vertex status updates.
{code} if (taskAttemptID != null) { // null for 0 task vertices{code}

Shouldnt this just break. If one output vertex is scheduled then all would be 
scheduled, right?
{code}+    for (Vertex destVertex : outputVertexEdgeMap.keySet()) {
+      if (vertexAlreadyScheduled(destVertex)) { // Nothing to do if already 
scheduled.
+      } else {{code}

Is this susceptible to counting errors because of attempt retries because 
failures/speculation etc?

This is probably not going to work well with 1-1 only edge where there no 
global dependency and hence waiting for all sources to get scheduled before 
scheduling 1-1 downstream tasks would not be correct.

Should probably rename this to WithThrottling rather than V2.


> Add a DAGScheduler which schedules tasks only when sources have been scheduled
> ------------------------------------------------------------------------------
>
>                 Key: TEZ-1750
>                 URL: https://issues.apache.org/jira/browse/TEZ-1750
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>            Priority: Critical
>         Attachments: TEZ-1750.1.txt
>
>
> Splitting out the patch on TEZ-1522 into a separate jira.
> There's several scenarios in which we end up scheduling downstream tasks 
> before their sources have been scheduled - and then get into a situation 
> where the sources are starved. Currently, anywhere a ShuffleVertexManager is 
> used can cause such behaviour - since it starts scheduling it's tasks after a 
> certain number of sources are complete, but subsequen non-shuffle 
> VertexManagers will scheduled immediately.
> Disabling slow-start is one option to achieve this (or setting slow start on 
> all vertices), but it doesn't work for the situation where dynamic reducer 
> parallelism kicks in - since it has to wait for source tasks to complete.
> The intent here is to add a DAGScheduler, which affectively negates the slow 
> start, and in case of dynamic parallelism determination, waits for upstream 
> tasks to be scheduled before scheduling downstream tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to