[ 
https://issues.apache.org/jira/browse/TEZ-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529528#comment-14529528
 ] 

Bikas Saha commented on TEZ-2404:
---------------------------------

Mixing some events going directly and some events going indirectly, may cause 
ordering issues. So the current patch may introduce those issues.

So we either fix recovery or punt it for later and revert TEZ-2325. If we 
revert TEZ-2325 then lets please create a jira for the recovery fix (unless one 
exists already) and mark that as a blocking TEZ-2325 and TEZ-2418.

> Handle DataMovementEvent before its TaskAttemptCompletedEvent
> -------------------------------------------------------------
>
>                 Key: TEZ-2404
>                 URL: https://issues.apache.org/jira/browse/TEZ-2404
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>            Priority: Critical
>         Attachments: TEZ-2404-1.patch, TEZ-2404-2.patch
>
>
> TEZ-2325 route TASK_ATTEMPT_COMPLETED_EVENT directly to the attempt, but it 
> would cause recovery issue. Recovery need that DataMovement event is handled 
> before TaskAttemptCompletedEvent, otherwise DataMovement event may be lost in 
> recovering and cause the its dependent tasks hang.
> 2 Ways to fix this issue.
> 1. Still route TaskAtttemptCompletedEvent in Vertex
> 2. route DataMovementEvent before TaskAttemptCompeltedEvent in 
> TezTaskAttemptListener



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to