[
https://issues.apache.org/jira/browse/TEZ-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541173#comment-14541173
]
Jeff Zhang edited comment on TEZ-2426 at 5/13/15 1:38 AM:
----------------------------------------------------------
[~sseth] I think the issue may still exist in the case of fault-tolerance in
the local mode. E.g. task_attempt_1 fails and it will clean up the spec
objects, then task_attempt_2 will got the empty
inputSpec/outputSpec/GroupInputSpec. Or we may disable the cleanup in local
mode ?
was (Author: zjffdu):
[~sseth] I think the issue may still exist in the case of fault-tolerance in
the local mode. E.g. task_attempt_1 fails and it will clean up the spec
objects, then task_attempt_2 will got the empty
inputSpec/outputSpec/GroupInputSpec.
> Ensure the eventRouter thread completes before switching to a new task and
> thread safety fixes in IPOContexts.
> --------------------------------------------------------------------------------------------------------------
>
> Key: TEZ-2426
> URL: https://issues.apache.org/jira/browse/TEZ-2426
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Bikas Saha
> Assignee: Siddharth Seth
> Priority: Critical
> Fix For: 0.7.0
>
> Attachments: TEZ-2426-3.patch, TEZ-2426.1.txt, TEZ-2426.2.txt,
> TEZ-2426.addendum.txt, am.log, container.log
>
>
> Sequence of events
> 1) Task A starts in a container
> 2) Task A complete event comes to AM
> 3) Task B starts in the same container
> 4) Task A's input calls some method on its context. Crashes with NPE
> 5) The crash sends an input failed event for Task A to the AM
> 6) Task A state machine crashes saying cannot handle failed after success
> In some cases, it could be that status update event is also sent after
> completion, though not sure if its related to the failed event being sent.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)