[ 
https://issues.apache.org/jira/browse/TEZ-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531890#comment-16531890
 ] 

Jonathan Eagles commented on TEZ-3912:
--------------------------------------

This patch brings us in line with MAPREDUCE-6633. Wrapping the exception we may 
want to log the exception instead of just the message. Also, the change in 
TEZ-3833 may not be necessary anymore.
{code:title=mapreduce.Fetcher}
      try {
        // Go!
        LOG.info("fetcher#" + id + " about to shuffle output of map "
            + mapOutput.getMapId() + " decomp: " + decompressedLength
            + " len: " + compressedLength + " to " + 
mapOutput.getDescription());
        mapOutput.shuffle(host, is, compressedLength, decompressedLength,
            metrics, reporter);
      } catch (java.lang.InternalError | Exception e) {
        LOG.warn("Failed to shuffle for fetcher#"+id, e);
        throw new IOException(e);
      }
{code}

> Fetchers should be more robust to corrupted inputs
> --------------------------------------------------
>
>                 Key: TEZ-3912
>                 URL: https://issues.apache.org/jira/browse/TEZ-3912
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jason Lowe
>            Assignee: Kuhu Shukla
>            Priority: Major
>         Attachments: TEZ-3912.001.patch
>
>
> I recently saw a case where a bad node in the cluster produced corrupted 
> shuffle data that caused the codec to throw IllegalArgumentException when 
> trying to fetch.  Fetchers currently only handle IOException and 
> InternalError, and any other type of exception will cause the entire task to 
> be torn down.  We should consider catching Exception like MapReduce does to 
> be more robust in light of other types of errors coming from the codec and 
> allow retries to occur.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to