[
https://issues.apache.org/jira/browse/TEZ-1909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359949#comment-14359949
]
Jeff Zhang commented on TEZ-1909:
---------------------------------
Upload new patch to address the review comments. [~hitesh] Please help review.
bq. there can be cases where data is partially written hence there might be an
error when reading the last record. Maybe we should add a simulated test for
this by writing invalid data to the end of an intermediate summary and dag file
and seeing whether the code handles it correctly?
Corrupted record can been handled correctly, add unit test in the new patch for
corrupted record in recovery file. And for corrupted records in summary file,
it would cause the recovery fail directly, and I think it make sense to do
that.
bq. skipAllOtherEvents should probably be a flag across all files for a given
dag. At the moment, it is considered only for a single dag file and reset.
Correct it in new patch.
bq. log line "LOG.info("isSpeculationEnabled:" + isSpeculationEnabled);" was
removed - not sure why.
It it added accidently in other jira, so remove it here.
bq. The above code needs a bit of cleanup in TestDAGRecovery - not sure why we
need 2 loops for the 3 attempts' recovery data.
The inner loop is for reading dag recovery log because now the dag recovery log
is dispersed in each attempt recovery path. The outer loop is for testing each
attempt. I have added more comments in the new patch.
> Remove need to copy over all events from attempt 1 to attempt 2 dir
> -------------------------------------------------------------------
>
> Key: TEZ-1909
> URL: https://issues.apache.org/jira/browse/TEZ-1909
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Hitesh Shah
> Assignee: Jeff Zhang
> Attachments: TEZ-1909-1.patch, TEZ-1909-2.patch
>
>
> Use of file versions should prevent the need for copying over data into a
> second attempt dir. Care needs to be taken to handle "last corrupt record"
> handling.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)