[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13257634#comment-13257634
 ] 

David Spies commented on MAPREDUCE-2389:
----------------------------------------

I've seen the same exact "Map output lost" exception as Todd using Amazon EMR
It wouldn't matter that much except that it occurs during the reduce phase 
after all the map tasks have already completed.  This causes the entire job to 
stall while the one map task is re-run and it's frequent enough to basically 
grind the entire job to a halt.
Does anyone know if there's a way to use an older version of Jetty with Amazon 
EMR?

                
> Spurious EOFExceptions reading SpillRecord index files
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-2389
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2389
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.22.0
>         Environment: Seen on RHEL 5.5, RHEL 6.0, local dirs on ext3, Java 
> 6u20 and 6u24
>            Reporter: Todd Lipcon
>            Priority: Critical
>         Attachments: stap-output.txt
>
>
> In large jobs, I see around 1 shuffle fetch out of every million fetches fail 
> with an EOFException reading the SpillRecord index file. After lots of 
> investigation, including systemtap, it looks like the read() syscall is 
> actually returning a premature "0" result for no reason, so this is likely a 
> kernel or filesystem bug which is exacerbated by some workload the TT does.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to