[jira] [Commented] (MAPREDUCE-4030) If the nodemanager on which the maptask is executed is going down before the mapoutput is consumed by the reducer,then the job is failing with shuffle error

Nishan Shetty (Commented) (JIRA) Mon, 19 Mar 2012 21:46:25 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233201#comment-13233201
 ]


Nishan Shetty commented on MAPREDUCE-4030:
------------------------------------------

I dont't see any log which is notifying to the AM about map output copy fail in 
the reducer and relaunching the map task by the AM in AM log
                
> If the nodemanager on which the maptask is executed is going down before the 
> mapoutput is consumed by the reducer,then the job is failing with shuffle 
> error
> ------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4030
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4030
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>            Reporter: Nishan Shetty
>
> My cluster has 2 NM's.
> The value of "mapreduce.job.reduce.slowstart.completedmaps" is set to 1.
> When the job execution is in progress and Mappers has finished about 99% 
> completion,one of the NM has gone down.
> The job has failed with the following trace
> "Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error 
> in shuffle in fetcher#1 at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:123) at 
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371) at 
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:148) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:396) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:143) Caused by: 
> java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.checkReducerHealth(ShuffleScheduler.java:253)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.copyFailed(ShuffleScheduler.java:187)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:240)
>  at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:152) "

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4030) If the nodemanager on which the maptask is executed is going down before the mapoutput is consumed by the reducer,then the job is failing with shuffle error

Reply via email to