[ 
https://issues.apache.org/jira/browse/TEZ-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140401#comment-14140401
 ] 

Siddharth Seth commented on TEZ-1592:
-------------------------------------

bq. When I applied a modified version of this patch to tez-0.5.0 branch (thanks 
Bikas Saha), the tests go through fine. It looks like the getting stuck part is 
related to tez-0.6-SNAPSHOT.
[~vikram.dixit], [~bikassaha] - what was the modified version of the patch 
doing, and did you figure out why things were getting stuck ? Not sure why this 
will get stuck or is related to to 0.6.0-SNAPSHOT.

[~bikassaha] - I'm aware the initWaitsForRootInitializers was specialized for 
the case where parallelism etc was setup (primarily the Distributor). That's 
primarily that we don't know if this path will make any changes. It's 
absolutely possible to use this path and change the parallelism.
In any case, we have to end up waiting for events to come in before processing.

The current code ends up not waiting for all Initializers to complete - and 
moves to the next state the moment parallelism is updated. That's what causes 
the initializer event to show up in RUNNING state. We could either handle the 
event in RUNNING state - or take the approach (in the patch) which doesn't move 
out of INITING till all Initializers are complete. This is more consistent with 
the case where parallelism is setup up front and we end up waiting for all 
initializers, and IMO a better approach until we have a better way for 
Initializers / VMs to signal to vertices that they're ready to go.

> Vertex should wait for all initializers to finish before moving to INITED 
> state
> -------------------------------------------------------------------------------
>
>                 Key: TEZ-1592
>                 URL: https://issues.apache.org/jira/browse/TEZ-1592
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>            Priority: Critical
>         Attachments: TEZ-1592.1.txt
>
>
> Reported by [~vikram.dixit]
> When using multiple initializers, the following stack trace is seen at times.
> {code}
> 2014-09-17 15:05:00,406 ERROR [AsyncDispatcher event handler] 
> org.apache.tez.dag.app.dag.impl.VertexImpl: Can't handle Invalid event 
> V_ROOT_INPUT_INITIALIZED on vertex Map 2 with vertexId 
> vertex_1410991351910_0002_8_01 at current state RUNNING
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> V_ROOT_INPUT_INITIALIZED at RUNNING
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1337)
>   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:168)
>   at 
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1641)
>   at 
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1627)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>   at java.lang.Thread.run(Thread.java:662)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to