[
https://issues.apache.org/jira/browse/MAPREDUCE-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233201#comment-13233201
]
Nishan Shetty commented on MAPREDUCE-4030:
------------------------------------------
I dont't see any log which is notifying to the AM about map output copy fail in
the reducer and relaunching the map task by the AM in AM log
> If the nodemanager on which the maptask is executed is going down before the
> mapoutput is consumed by the reducer,then the job is failing with shuffle
> error
> ------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-4030
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4030
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2
> Reporter: Nishan Shetty
>
> My cluster has 2 NM's.
> The value of "mapreduce.job.reduce.slowstart.completedmaps" is set to 1.
> When the job execution is in progress and Mappers has finished about 99%
> completion,one of the NM has gone down.
> The job has failed with the following trace
> "Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error
> in shuffle in fetcher#1 at
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:123) at
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371) at
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:148) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:396) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:143) Caused by:
> java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. at
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.checkReducerHealth(ShuffleScheduler.java:253)
> at
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.copyFailed(ShuffleScheduler.java:187)
> at
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:240)
> at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:152) "
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira