[ 
https://issues.apache.org/jira/browse/TEZ-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha reopened TEZ-1143:
-----------------------------


The code assumes that source split cannot happen if the vertex is already 
inited (ie its parallelism is set). The previous patch fixed the case for state 
running but not for case inited.

> 1-1 source split event should be handled in Vertex.RUNNING state
> ----------------------------------------------------------------
>
>                 Key: TEZ-1143
>                 URL: https://issues.apache.org/jira/browse/TEZ-1143
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Daniel Dai
>            Assignee: Bikas Saha
>             Fix For: 0.5.0
>
>         Attachments: TEZ-1143.1.patch, TEZ-1143.2.patch, TEZ-1143.3.patch, 
> syslog_dag_1400696568249_0001_1
>
>
> One-one edge fail when the parallelism of source vertex changes dynamically 
> (through a ShuffleVertexManager). Here is the stack:
> {code}
> 2014-05-21 00:05:55,284 INFO [AsyncDispatcher event handler] 
> org.apache.tez.dag.app.dag.impl.VertexImpl: Vertex 
> vertex_1400646157236_0012_1_03 parallelism set to 1 from 202014-05-21 
> 00:05:55,284 INFO [AsyncDispatcher event handler] 
> org.apache.tez.dag.app.dag.impl.VertexImpl: Removing task: 
> task_1400646157236_0012_1_03_0000012014-05-21 00:05:55,284 INFO 
> [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: 
> Removing task: task_1400646157236_0012_1_03_0000022014-05-21 00:05:55,284 
> INFO [AsyncDispatcher event handler] 
> org.apache.tez.dag.app.dag.impl.VertexImpl: Removing task: 
> task_1400646157236_0012_1_03_0000032014-05-21 00:05:55,284 INFO 
> [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: 
> Removing task: task_1400646157236_0012_1_03_0000042014-05-21 00:05:55,284 
> INFO [AsyncDispatcher event handler] 
> org.apache.tez.dag.app.dag.impl.VertexImpl: Removing task: 
> task_1400646157236_0012_1_03_0000052014-05-21 00:05:55,284 INFO 
> [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: 
> Removing task: task_1400646157236_0012_1_03_000006
> 2014-05-21 00:05:55,284 INFO [AsyncDispatcher event handler] 
> org.apache.tez.dag.app.dag.impl.VertexImpl: Removing task: 
> task_1400646157236_0012_1_03_0000072014-05-21 00:05:55,284 INFO 
> [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: 
> Removing task: task_1400646157236_0012_1_03_000008
> 2014-05-21 00:05:55,284 INFO [AsyncDispatcher event handler] 
> org.apache.tez.dag.app.dag.impl.VertexImpl: Removing task: 
> task_1400646157236_0012_1_03_000009
> 2014-05-21 00:05:55,285 INFO [AsyncDispatcher event handler] 
> org.apache.tez.dag.app.dag.impl.VertexImpl: Removing task: 
> task_1400646157236_0012_1_03_000010
> 2014-05-21 00:05:55,285 INFO [AsyncDispatcher event handler] 
> org.apache.tez.dag.app.dag.impl.VertexImpl: Removing task: 
> task_1400646157236_0012_1_03_000011
> 2014-05-21 00:05:55,285 INFO [AsyncDispatcher event handler] 
> org.apache.tez.dag.app.dag.impl.VertexImpl: Removing task: 
> task_1400646157236_0012_1_03_000012
> 2014-05-21 00:05:55,285 INFO [AsyncDispatcher event handler] 
> org.apache.tez.dag.app.dag.impl.VertexImpl: Removing task: 
> task_1400646157236_0012_1_03_000013
> 2014-05-21 00:05:55,285 INFO [AsyncDispatcher event handler] 
> org.apache.tez.dag.app.dag.impl.VertexImpl: Removing task: 
> task_1400646157236_0012_1_03_000014
> 2014-05-21 00:05:55,285 INFO [AsyncDispatcher event handler] 
> org.apache.tez.dag.app.dag.impl.VertexImpl: Removing task: 
> task_1400646157236_0012_1_03_000015
> 2014-05-21 00:05:55,285 INFO [AsyncDispatcher event handler] 
> org.apache.tez.dag.app.dag.impl.VertexImpl: Removing task: 
> task_1400646157236_0012_1_03_000016
> 2014-05-21 00:05:55,285 INFO [AsyncDispatcher event handler] 
> org.apache.tez.dag.app.dag.impl.VertexImpl: Removing task: 
> task_1400646157236_0012_1_03_000017
> 2014-05-21 00:05:55,285 INFO [AsyncDispatcher event handler] 
> org.apache.tez.dag.app.dag.impl.VertexImpl: Removing task: 
> task_1400646157236_0012_1_03_000018
> 2014-05-21 00:05:55,285 INFO [AsyncDispatcher event handler] 
> org.apache.tez.dag.app.dag.impl.VertexImpl: Removing task: 
> task_1400646157236_0012_1_03_000019
> 2014-05-21 00:05:55,285 INFO [AsyncDispatcher event handler] 
> org.apache.tez.dag.app.dag.impl.VertexImpl: Replacing edge manager for 
> source:scope-41 destination: vertex_1400646157236_0012_1_032014-05-21 
> 00:05:55,285 INFO [AsyncDispatcher event handler] 
> org.apache.tez.dag.history.HistoryEventHandler: 
> [HISTORY][DAG:dag_1400646157236_0012_1][Event:VERTEX_PARALLELISM_UPDATED]: 
> vertexId=vertex_1400646157236_0012_1_03, numTasks=1, vertexLocationHint=null, 
> edgeManagersCount=12014-05-21 00:05:55,286 INFO [AsyncDispatcher event 
> handler] org.apache.tez.dag.app.dag.impl.DAGImpl: Vertex 
> vertex_1400646157236_0012_1_02 completed., numCompletedVertices=3, 
> numSuccessfulVertices=3, numFailedVertices=0, numKilledVertices=0, 
> numVertices=72014-05-21 00:05:55,287 ERROR [AsyncDispatcher event handler] 
> org.apache.tez.dag.app.dag.impl.VertexImpl: Can't handle Invalid event 
> V_ONE_TO_ONE_SOURCE_SPLIT on vertex scope-61 with vertexId 
> vertex_1400646157236_0012_1_05 at current state 
> RUNNINGorg.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid 
> event: V_ONE_TO_ONE_SOURCE_SPLIT at RUNNING
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1263)
>         at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:158)
>         at 
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1716)
>         at 
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1702)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)   
>      at java.lang.Thread.run(Thread.java:695)
> {code}
> Attached complete AM log. scope-42 is the source vertex and scope-61 is the 
> destination vertex.
> The issue is that the code assumed that the split event will come before the 
> vertex starts. This may not be valid in all cases. E.g. if the event comes 
> from 2 different paths in the DAG then the vertex can start after 1 path sets 
> the parallelism and then the second path sends the event. Also if the 
> previous vertex was a shuffle/reduce then its parallelism can change while 
> its running, resulting in changing the current vertex parallelism while its 
> running.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to