[
https://issues.apache.org/jira/browse/TEZ-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129592#comment-14129592
]
Jeff Zhang commented on TEZ-1559:
---------------------------------
Attach the new patch.
bq. Recovery data is meant to be internal and not exposed to users in any case.
Make sense, use the vertexId in the new patch.
Other changes:
* remove counter track
* move the configuration fields to RecoverySerivce
> Add system tests for AM recovery
> --------------------------------
>
> Key: TEZ-1559
> URL: https://issues.apache.org/jira/browse/TEZ-1559
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Jeff Zhang
> Assignee: Jeff Zhang
> Attachments: Tez-1559-2.patch, Tez-1559-3.patch, Tez-1559.patch
>
>
> * [Fine-grained recovery task-level] In a vertex, task 0 is done task 1 is
> running. History flush happens. AM dies. Once AM is recovered, task 0 is not
> re-run. Task 1 is re-run.
> * [Data movement types] Test AM recovery with all data movement types
> including 1-1, broadcast, scatter-gather with/without shuffle. AM should die
> in 2 scenarios: first-vertex task finishes completely and partially.
> * [Kill AM many times] Set AM max attempt to high number. Kill many attempts.
> Last AM can still be recovered with latest AM history data.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)