[
https://issues.apache.org/jira/browse/TEZ-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207494#comment-14207494
]
Jeff Zhang commented on TEZ-1642:
---------------------------------
bq. Can testVertexPartiallyFinished_XXX be achieved by only scheduling 1 task,
waiting a certain amount of time ( or launching a thread to poll the task state
) and then halting the jvm? On the second attempt, the VM should schedule both
tasks unlike attempt 1 where only 1 task is scheduled?
Sounds a good idea, will try that.
> TestAMRecovery sometimes fail
> -----------------------------
>
> Key: TEZ-1642
> URL: https://issues.apache.org/jira/browse/TEZ-1642
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Jeff Zhang
> Assignee: Jeff Zhang
> Attachments: TEZ-1642-2.patch, TEZ-1642-3.patch, TEZ-1642.patch
>
>
> TestAMRecovery fails sometimes on testVertexPartiallyFinished_XXX.
> The scenario is that we'd like kill AM when vertex is partially finished (
> with 2 tasks, task_0 is finished and task_1 is running). When in recovery,
> task_0 should not rerun and task_1 should rerun. ( We use the recovery
> log(TaskAttemptFinishedEvent) to judge whether task is rerun)
> Currently, using VertexManager.onSourceTaskCompleted to control when to kill
> AM, but it is not perfect. VertexManager.onSourceTaskCompleted is not
> invoked at the moment task attempt is finished ( TaskAttempt send event to
> Task to tell TaskAttempt is finsihed, and then Task send event to Vertex to
> trigger VM.onSourceTaskCompleted)
> The following case is possible: task_0 finished -> task_1 finished ->
> VM.onSourceTaskCompleted -> VM.onSourceTaskCompleted
> In this case, we will take it as partially completed in the first
> VM.onSourceTaskCompleted, but actually the vertex is fully completed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)