[
https://issues.apache.org/jira/browse/TEZ-1909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357592#comment-14357592
]
Hitesh Shah edited comment on TEZ-1909 at 3/11/15 9:02 PM:
-----------------------------------------------------------
Minor comments:
- there can be cases where data is partially written hence there might be an
error when reading the last record. Maybe we should add a simulated test for
this by writing invalid data to the end of an intermediate summary and dag file
and seeing whether the code handles it correctly?
- skipAllOtherEvents should probably be a flag across all files for a given
dag. At the moment, it is considered only for a single dag file and reset.
- log line "LOG.info("isSpeculationEnabled:" + isSpeculationEnabled);" was
removed - not sure why.
{code}
for (int attemptNum=1; attemptNum<=3; ++attemptNum) {
List<HistoryEvent> historyEvents = new ArrayList<HistoryEvent>();
for (int i=1 ;i<=attemptNum;++i) {
Path currentAttemptRecoveryDataDir =
TezCommonUtils.getAttemptRecoveryPath(recoveryDataDir,i);
Path recoveryFilePath = new Path(currentAttemptRecoveryDataDir,
appId.toString().replace("application", "dag") + "_1" +
TezConstants.DAG_RECOVERY_RECOVER_FILE_SUFFIX);
historyEvents.addAll(RecoveryParser.parseDAGRecoveryFile(
fs.open(recoveryFilePath)));
}
{code}
The above code needs a bit of cleanup in TestDAGRecovery - not sure why we need
2 loops for the 3 attempts' recovery data.
was (Author: hitesh):
Minor comments:
- there can be cases where data is partially written hence there might be an
error when reading the last record. Maybe we should add a simulated test for
this by writing invalid data to the end of an intermediate summary and dag file
and seeing whether the code handles it correctly?
- skipAllOtherEvents should probably be a flag across all files for a given
dag. At the moment, it is considered only for a single dag file and reset.
- log line "LOG.info("isSpeculationEnabled:" + isSpeculationEnabled);" was
removed - not sure why.
{code}
for (int attemptNum=1; attemptNum<=3; ++attemptNum) {
List<HistoryEvent> historyEvents = new ArrayList<HistoryEvent>();
for (int i=1 ;i<=attemptNum;++i) {
Path currentAttemptRecoveryDataDir =
TezCommonUtils.getAttemptRecoveryPath(recoveryDataDir,i);
Path recoveryFilePath = new Path(currentAttemptRecoveryDataDir,
appId.toString().replace("application", "dag") + "_1" +
TezConstants.DAG_RECOVERY_RECOVER_FILE_SUFFIX);
historyEvents.addAll(RecoveryParser.parseDAGRecoveryFile(
fs.open(recoveryFilePath)));
}
{code}
The above code needs a bit of cleanup - not sure why we need 2 loops for the 3
attempts' recovery data.
> Remove need to copy over all events from attempt 1 to attempt 2 dir
> -------------------------------------------------------------------
>
> Key: TEZ-1909
> URL: https://issues.apache.org/jira/browse/TEZ-1909
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Hitesh Shah
> Assignee: Jeff Zhang
> Attachments: TEZ-1909-1.patch
>
>
> Use of file versions should prevent the need for copying over data into a
> second attempt dir. Care needs to be taken to handle "last corrupt record"
> handling.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)