[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-2389:
-----------------------------------

    Attachment: stap-output.txt

Attached is the partial output of a systemtap script I wrote which tracks all 
access to files named *.out.index. You can see the file is opened and 
successfully read a couple of times, but the third time, it gets a spurious EOF 
reading the last 8 bytes of the file.

I manually verified this file on disk and it is not actually truncated or 
modified in any way.

> Spurious EOFExceptions reading SpillRecord index files
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-2389
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2389
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.22.0
>         Environment: Seen on RHEL 5.5, RHEL 6.0, local dirs on ext3, Java 
> 6u20 and 6u24
>            Reporter: Todd Lipcon
>            Priority: Critical
>         Attachments: stap-output.txt
>
>
> In large jobs, I see around 1 shuffle fetch out of every million fetches fail 
> with an EOFException reading the SpillRecord index file. After lots of 
> investigation, including systemtap, it looks like the read() syscall is 
> actually returning a premature "0" result for no reason, so this is likely a 
> kernel or filesystem bug which is exacerbated by some workload the TT does.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to