[
https://issues.apache.org/jira/browse/TEZ-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531890#comment-16531890
]
Jonathan Eagles commented on TEZ-3912:
--------------------------------------
This patch brings us in line with MAPREDUCE-6633. Wrapping the exception we may
want to log the exception instead of just the message. Also, the change in
TEZ-3833 may not be necessary anymore.
{code:title=mapreduce.Fetcher}
try {
// Go!
LOG.info("fetcher#" + id + " about to shuffle output of map "
+ mapOutput.getMapId() + " decomp: " + decompressedLength
+ " len: " + compressedLength + " to " +
mapOutput.getDescription());
mapOutput.shuffle(host, is, compressedLength, decompressedLength,
metrics, reporter);
} catch (java.lang.InternalError | Exception e) {
LOG.warn("Failed to shuffle for fetcher#"+id, e);
throw new IOException(e);
}
{code}
> Fetchers should be more robust to corrupted inputs
> --------------------------------------------------
>
> Key: TEZ-3912
> URL: https://issues.apache.org/jira/browse/TEZ-3912
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Jason Lowe
> Assignee: Kuhu Shukla
> Priority: Major
> Attachments: TEZ-3912.001.patch
>
>
> I recently saw a case where a bad node in the cluster produced corrupted
> shuffle data that caused the codec to throw IllegalArgumentException when
> trying to fetch. Fetchers currently only handle IOException and
> InternalError, and any other type of exception will cause the entire task to
> be torn down. We should consider catching Exception like MapReduce does to
> be more robust in light of other types of errors coming from the codec and
> allow retries to occur.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)