[jira] [Commented] (MAPREDUCE-6633) AM should retry map attempts if the reduce task encounters commpression related errors.

Jason Lowe (JIRA) Fri, 10 Nov 2017 13:24:18 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16248044#comment-16248044
 ]


Jason Lowe commented on MAPREDUCE-6633:
---------------------------------------

If this is very repeatable and seemingly tied only to a specific job while 
others run fine then it sounds like it's not the same type of problem this JIRA 
was addressing, i.e.: better recovery handling for corrupted compressed files.  
If it is a software bug, the stacktrace should be very indicative in this case 
where the bug probably lies -- if the stacktrace shows the reducer is in the 
shuffle phase then it is likely an issue in the Hadoop framework since it 
handles almost everything in the shuffle phase.  If the stacktrace indicates 
the reducer is in the reduce phase then the balance tips more towards user code 
in the reducer.  Anyway at this point this discussion is rapidly diverging from 
the purpose of this JIRA and should be tracked in a separate JIRA if this 
indeed looks like a bug in the Hadoop framework.

> AM should retry map attempts if the reduce task encounters commpression 
> related errors.
> ---------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6633
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6633
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.7.2
>            Reporter: Rushabh S Shah
>            Assignee: Rushabh S Shah
>             Fix For: 2.8.0, 2.7.3, 3.0.0-alpha1
>
>         Attachments: MAPREDUCE-6633.patch
>
>
> When reduce task encounters compression related errors, AM  doesn't retry the 
> corresponding map task.
> In one of the case we encountered, here is the stack trace.
> {noformat}
> 2016-01-27 13:44:28,915 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#29
>       at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
>       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>       at 
> com.hadoop.compression.lzo.LzoDecompressor.setInput(LzoDecompressor.java:196)
>       at 
> org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:104)
>       at 
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
>       at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
>       at 
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97)
>       at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:537)
>       at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336)
>       at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)
> {noformat}
> In this case, the node on which the map task ran had a bad drive.
> If the AM had retried running that map task somewhere else, the job 
> definitely would have succeeded.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (MAPREDUCE-6633) AM should retry map attempts if the reduce task encounters commpression related errors.

Reply via email to