[
https://issues.apache.org/jira/browse/TEZ-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140401#comment-14140401
]
Siddharth Seth commented on TEZ-1592:
-------------------------------------
bq. When I applied a modified version of this patch to tez-0.5.0 branch (thanks
Bikas Saha), the tests go through fine. It looks like the getting stuck part is
related to tez-0.6-SNAPSHOT.
[~vikram.dixit], [~bikassaha] - what was the modified version of the patch
doing, and did you figure out why things were getting stuck ? Not sure why this
will get stuck or is related to to 0.6.0-SNAPSHOT.
[~bikassaha] - I'm aware the initWaitsForRootInitializers was specialized for
the case where parallelism etc was setup (primarily the Distributor). That's
primarily that we don't know if this path will make any changes. It's
absolutely possible to use this path and change the parallelism.
In any case, we have to end up waiting for events to come in before processing.
The current code ends up not waiting for all Initializers to complete - and
moves to the next state the moment parallelism is updated. That's what causes
the initializer event to show up in RUNNING state. We could either handle the
event in RUNNING state - or take the approach (in the patch) which doesn't move
out of INITING till all Initializers are complete. This is more consistent with
the case where parallelism is setup up front and we end up waiting for all
initializers, and IMO a better approach until we have a better way for
Initializers / VMs to signal to vertices that they're ready to go.
> Vertex should wait for all initializers to finish before moving to INITED
> state
> -------------------------------------------------------------------------------
>
> Key: TEZ-1592
> URL: https://issues.apache.org/jira/browse/TEZ-1592
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Siddharth Seth
> Assignee: Siddharth Seth
> Priority: Critical
> Attachments: TEZ-1592.1.txt
>
>
> Reported by [~vikram.dixit]
> When using multiple initializers, the following stack trace is seen at times.
> {code}
> 2014-09-17 15:05:00,406 ERROR [AsyncDispatcher event handler]
> org.apache.tez.dag.app.dag.impl.VertexImpl: Can't handle Invalid event
> V_ROOT_INPUT_INITIALIZED on vertex Map 2 with vertexId
> vertex_1410991351910_0002_8_01 at current state RUNNING
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event:
> V_ROOT_INPUT_INITIALIZED at RUNNING
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1337)
> at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:168)
> at
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1641)
> at
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1627)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:662)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)