[
https://issues.apache.org/jira/browse/TEZ-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yingda Chen reassigned TEZ-4060:
--------------------------------
Assignee: Ying Han
> NoOpVertexManager schedules tasks that are not ready to run
> -----------------------------------------------------------
>
> Key: TEZ-4060
> URL: https://issues.apache.org/jira/browse/TEZ-4060
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.9.1
> Reporter: Adrian Nicoara
> Assignee: Ying Han
> Priority: Major
>
> During recovery, vertices which have already been reconfigured get assigned a
> NoOpVertexManager:
> [https://github.com/apache/tez/blob/8395a9560a131799f1af49b26e1f10f12ef48752/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java#L2689-L2711]
> [https://github.com/apache/tez/blob/8395a9560a131799f1af49b26e1f10f12ef48752/tez-dag/src/main/java/org/apache/tez/dag/app/RecoveryParser.java#L970-L972]
> The NoOpVertexManager directly schedules tasks upon being started:
> [https://github.com/apache/tez/blob/8395a9560a131799f1af49b26e1f10f12ef48752/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java#L4628]
> However, for a large graph, we can end up having all vertices configured and
> started, before many of their inputs (for vertices that are not attached to
> the roots) are generated.
> This ends up scheduling tasks which are not ready to run, and will ultimately
> fail until their inputs are generated.
> In addition to bypassing input dependency checking, which is generally done
> in VertexManagerPlugin#onSourceTaskCompleted, we lose the ability of
> executing custom logic within our own VertexManagerPlugins that is needed for
> the configuration of downstream vertices. This is due to the fact that we
> communicate some graph configuration metadata through global objects that are
> populated through calls to VertexManagerPlugin#onVertexStateUpdated.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)