[
https://issues.apache.org/jira/browse/TEZ-1909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387888#comment-14387888
]
Jeff Zhang commented on TEZ-1909:
---------------------------------
[~hitesh] Upload new patch, please help review it.
* Regarding the corrupted summary record, add the following 2 check. Because
SummaryEventProto.parseDelimitedFrom(summaryStream) only read the size of the
protobuf, may throw exception when parsing the fields. There's one unit test
for this in TestRecoveryParser.
{code}
TezDAGID dagId = TezDAGID.fromString(proto.getDagId());
if (dagId == null) {
throw new IOException("null dagId, summary records may be corrupted");
}
{code}
{code}
try {
dagSummaryDataMap.get(dagId).handleSummaryEvent(proto);
} catch (Exception e) {
// any exception when parsing protobuf
throw new IOException("Error when parsing summary event proto", e);
}
{code}
> Remove need to copy over all events from attempt 1 to attempt 2 dir
> -------------------------------------------------------------------
>
> Key: TEZ-1909
> URL: https://issues.apache.org/jira/browse/TEZ-1909
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Hitesh Shah
> Assignee: Jeff Zhang
> Attachments: TEZ-1909-1.patch, TEZ-1909-2.patch, TEZ-1909-3.patch,
> TEZ-1909-4.patch
>
>
> Use of file versions should prevent the need for copying over data into a
> second attempt dir. Care needs to be taken to handle "last corrupt record"
> handling.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)