[
https://issues.apache.org/jira/browse/TEZ-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14203589#comment-14203589
]
Siddharth Seth commented on TEZ-1750:
-------------------------------------
Marked as Expert level. I don't think unstable is required though... it allows
the DAGScheduler to be configured, which is something we do need.
1-1 edges can be problematic in such cases. Realistically, though - if the
consumer of a 1-1 edge produces final output, at the moment - that isn't read
till the DAG completes. Otherwise, it's likely to be connected to some other
source which requires all outputs.
This could be addressed, in a future patch, with additional details being sent
over in the Task launch request.
The bigger problem this addresses is the slow start (either configured or due
to Reduce parallelism changes), on a producer, but no slow start on downstream
vertices - which can cause some bad scheduling behaviour. One intent of slow
start is to prevent unnecessary cluster utilization - however we can end up
with situations where we not only end up using the cluster unnecessarily, but
in the process also harm the currently executing DAG.
> Add a DAGScheduler which schedules tasks only when sources have been scheduled
> ------------------------------------------------------------------------------
>
> Key: TEZ-1750
> URL: https://issues.apache.org/jira/browse/TEZ-1750
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Siddharth Seth
> Assignee: Siddharth Seth
> Priority: Critical
> Attachments: TEZ-1750.1.txt, TEZ-1750.2.txt
>
>
> Splitting out the patch on TEZ-1522 into a separate jira.
> There's several scenarios in which we end up scheduling downstream tasks
> before their sources have been scheduled - and then get into a situation
> where the sources are starved. Currently, anywhere a ShuffleVertexManager is
> used can cause such behaviour - since it starts scheduling it's tasks after a
> certain number of sources are complete, but subsequen non-shuffle
> VertexManagers will scheduled immediately.
> Disabling slow-start is one option to achieve this (or setting slow start on
> all vertices), but it doesn't work for the situation where dynamic reducer
> parallelism kicks in - since it has to wait for source tasks to complete.
> The intent here is to add a DAGScheduler, which affectively negates the slow
> start, and in case of dynamic parallelism determination, waits for upstream
> tasks to be scheduled before scheduling downstream tasks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)