[
https://issues.apache.org/jira/browse/TEZ-1909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14340077#comment-14340077
]
Jeff Zhang edited comment on TEZ-1909 at 2/27/15 12:26 PM:
-----------------------------------------------------------
Attach patch [~hitesh] please help review it.
* Split the summary event and recovery event into each attempt directory like
following
{code}
-rw-r--r-- 1 jzhang supergroup 1661372 2015-02-27 14:58
/tmp/temp-1397147140/.tez/application_1424916692162_0036/recovery/1/dag_1424916692162_0036_1.recovery
-rw-r--r-- 1 jzhang supergroup 145 2015-02-27 14:58
/tmp/temp-1397147140/.tez/application_1424916692162_0036/recovery/1/summary
-rw-r--r-- 1 jzhang supergroup 39284 2015-02-27 14:59
/tmp/temp-1397147140/.tez/application_1424916692162_0036/recovery/2/dag_1424916692162_0036_1.recovery
-rw-r--r-- 1 jzhang supergroup 222 2015-02-27 14:59
/tmp/temp-1397147140/.tez/application_1424916692162_0036/recovery/2/summary
-rw-r--r-- 1 jzhang supergroup 19028 2015-02-27 13:59
/tmp/temp-1397147140/.tez/application_1424916692162_0036/recovery/3/dag_1424916692162_0036_1.recovery
-rw-r--r-- 1 jzhang supergroup 448 2015-02-27 13:59
/tmp/temp-1397147140/.tez/application_1424916692162_0036/recovery/3/summary
{code}
* Remove dataRecoveredFlagFile. Because I think it was for checking whether AM
is crashed when recovering. But now since we won't copy data to the new attempt
dir, so there would be no data lost when AM crash in the process of
recoverying. It won't affect the next AM attempt.
was (Author: zjffdu):
Attach patch [~hitesh] please help review it.
* Split the summary event and recovery event into each attempt directory like
following
{code}
-rw-r--r-- 1 jzhang supergroup 1661372 2015-02-27 14:58
/tmp/temp-1397147140/.tez/application_1424916692162_0036/recovery/1/dag_1424916692162_0036_1.recovery
-rw-r--r-- 1 jzhang supergroup 145 2015-02-27 14:58
/tmp/temp-1397147140/.tez/application_1424916692162_0036/recovery/1/summary
-rw-r--r-- 1 jzhang supergroup 39284 2015-02-27 14:59
/tmp/temp-1397147140/.tez/application_1424916692162_0036/recovery/2/dag_1424916692162_0036_1.recovery
-rw-r--r-- 1 jzhang supergroup 222 2015-02-27 14:59
/tmp/temp-1397147140/.tez/application_1424916692162_0036/recovery/2/summary
-rw-r--r-- 1 jzhang supergroup 19028 2015-02-27 13:59
/tmp/temp-1397147140/.tez/application_1424916692162_0036/recovery/3/dag_1424916692162_0036_1.recovery
-rw-r--r-- 1 jzhang supergroup 448 2015-02-27 13:59
/tmp/temp-1397147140/.tez/application_1424916692162_0036/recovery/3/summary
{code}
* Remove dataRecoveredFlagFile. Because I think it is for checking whether AM
is shutdown when recovering. But now since we won't copy data to the new
attempt dir, so there would be no data lost.
It won't affect the next AM attempt.
> Remove need to copy over all events from attempt 1 to attempt 2 dir
> -------------------------------------------------------------------
>
> Key: TEZ-1909
> URL: https://issues.apache.org/jira/browse/TEZ-1909
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Hitesh Shah
> Assignee: Jeff Zhang
> Attachments: TEZ-1909-1.patch
>
>
> Use of file versions should prevent the need for copying over data into a
> second attempt dir. Care needs to be taken to handle "last corrupt record"
> handling.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)