[
https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527033#comment-14527033
]
Siddharth Seth commented on TEZ-2379:
-------------------------------------
One thing to consider here is that the individual state machines should be
complete in themselves, and should not make assumptions about other state
machines. This makes them a lot easier to reason about (we aren't there yet
though)
TaskImpl
- Already knows how to handle ATTEMPT_KILLED and ATTEMPT_FAILED in the SUCCESS
state. It'll, however, error out in the FAILED or KILLED state - but there's
nothing to be done there if these events are received.
TaskAttemptImpl
- If moving from an one 'external' state to another - should inform the Task,
and let it deal with the state change.
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event:
> T_ATTEMPT_KILLED at KILLED
> ------------------------------------------------------------------------------------------------------
>
> Key: TEZ-2379
> URL: https://issues.apache.org/jira/browse/TEZ-2379
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Rajesh Balamohan
> Assignee: Hitesh Shah
> Priority: Blocker
> Attachments: TEZ-2379.1.patch, TEZ-2379.2.patch, TEZ-2379.3.patch
>
>
> {noformat}
> 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl:
> Can't handle this event at current state for
> task_1429683757595_0479_1_03_000013
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event:
> T_ATTEMPT_KILLED at KILLED
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at
> org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
> at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853)
> at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106)
> at
> org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874)
> at
> org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860)
> at
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
> at
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Additional notes:
> ============
> Hive - latest build
> Tez - master
> tpch-200 gb scale q_17 (kill the job in the middle of execution)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)