[ 
https://issues.apache.org/jira/browse/TEZ-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-1642:
----------------------------
    Description: 
TestAMRecovery fails sometimes on testVertexPartiallyFinished_XXX.  
The scenario is that we'd like kill AM when vertex is partially finished ( with 
2 tasks, task_0 is finished and task_1 is running). When in recovery, task_0 
should not rerun and task_1 should rerun. ( We use the recovery 
log(TaskAttemptFinishedEvent) to judge whether task is rerun)
Currently, using VertexManager.onSourceTaskCompleted to control when to kill 
AM, but it is not perfect.  VertexManager.onSourceTaskCompleted is not invoked 
at the moment task attempt is finished ( TaskAttempt send event to Task to tell 
TaskAttempt is finsihed, and then Task send event to Vertex to trigger 
VM.onSourceTaskCompleted) 
The following case is possible: task_0 finished -> task_1 finished -> 
VM.onSourceTaskCompleted -> VM.onSourceTaskCompleted
In this case, we will take it as partially completed in the first 
VM.onSourceTaskCompleted, but actually the vertex is fully completed.

  was:
TestAMRecovery fails sometimes on testVertexPartiallyFinished_XXX.  
The scenario is that we'd like kill AM when vertex is partially finished ( with 
2 tasks, task_0 is finished and task_1 is running). When in recovery, task_0 
should not rerun and task_1 should rerun. ( We use the recovery 
log(TaskAttemptFinishedEvent) to judge whether task is rerun)
Currently, using VertexManager.onSourceTaskCompleted to control when to kill 
AM, but it is not perfect.  VertexManager.onSourceTaskCompleted is not invoked 
at the moment task attempt is finished ( TaskAttempt send event to Task to tell 
TaskAttempt is finsihed, and then Task send event to Vertex to trigger 
VM.onSourceTaskCompleted) 
The following case is possible: task_attempt_0 finished -> task_atttempt_1 
finished -> VM.onSourceTaskCompleted -> VM.onSourceTaskCompleted



> TestAMRecovery sometimes fail
> -----------------------------
>
>                 Key: TEZ-1642
>                 URL: https://issues.apache.org/jira/browse/TEZ-1642
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>         Attachments: TEZ-1642-2.patch, TEZ-1642-3.patch, TEZ-1642.patch
>
>
> TestAMRecovery fails sometimes on testVertexPartiallyFinished_XXX.  
> The scenario is that we'd like kill AM when vertex is partially finished ( 
> with 2 tasks, task_0 is finished and task_1 is running). When in recovery, 
> task_0 should not rerun and task_1 should rerun. ( We use the recovery 
> log(TaskAttemptFinishedEvent) to judge whether task is rerun)
> Currently, using VertexManager.onSourceTaskCompleted to control when to kill 
> AM, but it is not perfect.  VertexManager.onSourceTaskCompleted is not 
> invoked at the moment task attempt is finished ( TaskAttempt send event to 
> Task to tell TaskAttempt is finsihed, and then Task send event to Vertex to 
> trigger VM.onSourceTaskCompleted) 
> The following case is possible: task_0 finished -> task_1 finished -> 
> VM.onSourceTaskCompleted -> VM.onSourceTaskCompleted
> In this case, we will take it as partially completed in the first 
> VM.onSourceTaskCompleted, but actually the vertex is fully completed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to