[ 
https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518337#comment-14518337
 ] 

Bikas Saha commented on TEZ-2379:
---------------------------------

1) Client issued dag kill that caused all tasks to get kill.
2) Task sent kill request to its attempt and started waiting for attempt for 
finish
3) Attempt succeeded - sent done
4) Task got attempt success and went into killed state because all its attempts 
are done
5) Attempt got kill request - it honored that kill request in 
TerminatedAfterSuccessTransition and sent killed back to task.
6) Task got attempt killed in killed state and that is not handled.

>From what I see in the code, 5 seems to be the problem here. The attempt 
>should ignore kill request if its already done. Attempt is killed when a 
>different attempt is successful and this attempt is not needed. Or when the 
>task is killed. Task retroactive kill in which a successful task is killed 
>(say in order to run it again after node failure) does not use this flow. So 
>unless we can think of any other use cases for a successful attempt 
>transitioning to killed, we should ignore kill request in attempt if the 
>attempt is already succeeded.

> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> T_ATTEMPT_KILLED at KILLED
> ------------------------------------------------------------------------------------------------------
>
>                 Key: TEZ-2379
>                 URL: https://issues.apache.org/jira/browse/TEZ-2379
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>            Priority: Blocker
>
> {noformat}
> 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: 
> Can't handle this event at current state for 
> task_1429683757595_0479_1_03_000013
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> T_ATTEMPT_KILLED at KILLED
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at 
> org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
>         at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853)
>         at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106)
>         at 
> org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874)
>         at 
> org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860)
>         at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
>         at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Additional notes:
> ============
> Hive - latest build 
> Tez - master
> tpch-200 gb scale q_17 (kill the job in the middle of execution)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to