[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028941#comment-13028941
 ] 

John Novatnack commented on MAPREDUCE-2389:
-------------------------------------------

Todd: Thanks we will try that.  Most of the time it seems to just be causing 
occasional task failure, but we have also seen the exception happen on a 
handful of slaves simultaneously and eventually push the jobtracker to run out 
of heap.  Right now our heap is 1gb for the jobtracker with 90 slaves so we'll 
try increasing it and see if the problem persists with that and the older 
version of Jetty.

OS: Ubuntu 10.10
JVM:
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)

> Spurious EOFExceptions reading SpillRecord index files
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-2389
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2389
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.22.0
>         Environment: Seen on RHEL 5.5, RHEL 6.0, local dirs on ext3, Java 
> 6u20 and 6u24
>            Reporter: Todd Lipcon
>            Priority: Critical
>         Attachments: stap-output.txt
>
>
> In large jobs, I see around 1 shuffle fetch out of every million fetches fail 
> with an EOFException reading the SpillRecord index file. After lots of 
> investigation, including systemtap, it looks like the read() syscall is 
> actually returning a premature "0" result for no reason, so this is likely a 
> kernel or filesystem bug which is exacerbated by some workload the TT does.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to