[
https://issues.apache.org/jira/browse/TEZ-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202327#comment-14202327
]
Bikas Saha commented on TEZ-1750:
---------------------------------
typo -
{code}
of the type od edges
// Tacks vertices f
// A new taks coming
{code}
If pending event are first put into the queue then we can avoid extra code to
handle it if scheduling is triggered.
{code}+ boolean scheduled = trySchedulingVertex(vertex);
+ if (scheduled) {
+ LOG.info("Scheduled vertex: " + vertex.getLogIdentifier());
+ // If ready to be scheduled, send out pending events and the current
event.
+ // Send events out for this vertex first. Then try scheduling
downstream vertices.
+ sendEventsForVertex(vertex.getName());
+ sendEvent(attemptEvent);
+ if (LOG.isDebugEnabled()) {
+ LOG.debug("Processing downstream vertices for vertex: " +
vertex.getLogIdentifier());
+ }
+ processDownstreamVertices(vertex);
+ } else {
+ pendingEvents.put(vertex.getName(), attemptEvent);{code}
0 task vertices would never have any scheduling requests for tasks. they
immediately move to succeeded state. This should in fact be a precondition. If
needed, it could check for 0 task vertices by traversing the graph initially or
by listening for vertex status updates.
{code} if (taskAttemptID != null) { // null for 0 task vertices{code}
Shouldnt this just break. If one output vertex is scheduled then all would be
scheduled, right?
{code}+ for (Vertex destVertex : outputVertexEdgeMap.keySet()) {
+ if (vertexAlreadyScheduled(destVertex)) { // Nothing to do if already
scheduled.
+ } else {{code}
Is this susceptible to counting errors because of attempt retries because
failures/speculation etc?
This is probably not going to work well with 1-1 only edge where there no
global dependency and hence waiting for all sources to get scheduled before
scheduling 1-1 downstream tasks would not be correct.
Should probably rename this to WithThrottling rather than V2.
> Add a DAGScheduler which schedules tasks only when sources have been scheduled
> ------------------------------------------------------------------------------
>
> Key: TEZ-1750
> URL: https://issues.apache.org/jira/browse/TEZ-1750
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Siddharth Seth
> Assignee: Siddharth Seth
> Priority: Critical
> Attachments: TEZ-1750.1.txt
>
>
> Splitting out the patch on TEZ-1522 into a separate jira.
> There's several scenarios in which we end up scheduling downstream tasks
> before their sources have been scheduled - and then get into a situation
> where the sources are starved. Currently, anywhere a ShuffleVertexManager is
> used can cause such behaviour - since it starts scheduling it's tasks after a
> certain number of sources are complete, but subsequen non-shuffle
> VertexManagers will scheduled immediately.
> Disabling slow-start is one option to achieve this (or setting slow start on
> all vertices), but it doesn't work for the situation where dynamic reducer
> parallelism kicks in - since it has to wait for source tasks to complete.
> The intent here is to add a DAGScheduler, which affectively negates the slow
> start, and in case of dynamic parallelism determination, waits for upstream
> tasks to be scheduled before scheduling downstream tasks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)