[
https://issues.apache.org/jira/browse/TEZ-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526931#comment-14526931
]
Hitesh Shah edited comment on TEZ-2404 at 5/4/15 5:54 PM:
----------------------------------------------------------
BUmping up priority as this means recovery is potentially broken.
[~zjffdu] It looks like we need a recovery related test to ensure that all data
movements events are always stored before a task completion event.
was (Author: hitesh):
BUmping up priority as this means recovery is potentially broken.
[~zjffdu] It looks like we need a recovery related test to ensure that data
movements events are always stored before a task completion event.
> Handle DataMovementEvent before its TaskAttemptCompletedEvent
> -------------------------------------------------------------
>
> Key: TEZ-2404
> URL: https://issues.apache.org/jira/browse/TEZ-2404
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Jeff Zhang
> Assignee: Jeff Zhang
> Priority: Critical
> Attachments: TEZ-2404-1.patch, TEZ-2404-2.patch
>
>
> TEZ-2325 route TASK_ATTEMPT_COMPLETED_EVENT directly to the attempt, but it
> would cause recovery issue. Recovery need that DataMovement event is handled
> before TaskAttemptCompletedEvent, otherwise DataMovement event may be lost in
> recovering and cause the its dependent tasks hang.
> 2 Ways to fix this issue.
> 1. Still route TaskAtttemptCompletedEvent in Vertex
> 2. route DataMovementEvent before TaskAttemptCompeltedEvent in
> TezTaskAttemptListener
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)