[ https://issues.apache.org/jira/browse/MAPREDUCE-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Joseph Evans updated MAPREDUCE-4030: ------------------------------------------- Target Version/s: 0.23.3, 2.0.0, 3.0.0 (was: 0.23.2) > If the nodemanager on which the maptask is executed is going down before the > mapoutput is consumed by the reducer,then the job is failing with shuffle > error > ------------------------------------------------------------------------------------------------------------------------------------------------------------ > > Key: MAPREDUCE-4030 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4030 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 > Reporter: Nishan Shetty > Assignee: Devaraj K > > My cluster has 2 NM's. > The value of "mapreduce.job.reduce.slowstart.completedmaps" is set to 1. > When the job execution is in progress and Mappers has finished about 99% > completion,one of the NM has gone down. > The job has failed with the following trace > "Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error > in shuffle in fetcher#1 at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:123) at > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371) at > org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:148) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:396) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:143) Caused by: > java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. at > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.checkReducerHealth(ShuffleScheduler.java:253) > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.copyFailed(ShuffleScheduler.java:187) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:240) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:152) " -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira