[
https://issues.apache.org/jira/browse/TEZ-738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hitesh Shah updated TEZ-738:
----------------------------
Attachment: TEZ-738.2.patch
Looking at the logs, there was a race between pre-empting the container and the
task completing on the container in question.
Task failure handles the preemption event but not in case of successful
completion. Fix was to add the missing transition and make its handling a
no-op.
[~sseth] [~bikassaha] Review please.
> Hive query fails with Invalid event: TA_CONTAINER_PREEMPTED at SUCCEEDED
> ------------------------------------------------------------------------
>
> Key: TEZ-738
> URL: https://issues.apache.org/jira/browse/TEZ-738
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.2.0
> Reporter: Gopal V
> Assignee: Hitesh Shah
> Labels: hive
> Fix For: 0.3.0
>
> Attachments: TEZ-738.1.wip.patch, TEZ-738.2.patch, am.log.gz,
> container_1389745080241_0006_01_000335.log
>
>
> The hive query fails with the following error
> {code}
> 2014-01-15 00:58:08,324 ERROR [AsyncDispatcher event handler]
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl: Can't handle this event at
> current state for attempt_1389745080241_0006_1_02_000276_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event:
> TA_CONTAINER_PREEMPTED at SUCCEEDED
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:495)
> at
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:99)
> {code}
> This happens during a large join where a few tasks seem to have failed.
> {code}
> vertexName=Map 9, taskAttemptId=attempt_1389745080241_0006_1_07_002425_0,
> startTime=1389747463623, finishTime=1389747481016, timeTaken=17393,
> status=FAILED
> {code}
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)