[
https://issues.apache.org/jira/browse/MAPREDUCE-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Lipcon updated MAPREDUCE-2389:
-----------------------------------
Attachment: stap-output.txt
Attached is the partial output of a systemtap script I wrote which tracks all
access to files named *.out.index. You can see the file is opened and
successfully read a couple of times, but the third time, it gets a spurious EOF
reading the last 8 bytes of the file.
I manually verified this file on disk and it is not actually truncated or
modified in any way.
> Spurious EOFExceptions reading SpillRecord index files
> ------------------------------------------------------
>
> Key: MAPREDUCE-2389
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2389
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: tasktracker
> Affects Versions: 0.22.0
> Environment: Seen on RHEL 5.5, RHEL 6.0, local dirs on ext3, Java
> 6u20 and 6u24
> Reporter: Todd Lipcon
> Priority: Critical
> Attachments: stap-output.txt
>
>
> In large jobs, I see around 1 shuffle fetch out of every million fetches fail
> with an EOFException reading the SpillRecord index file. After lots of
> investigation, including systemtap, it looks like the read() syscall is
> actually returning a premature "0" result for no reason, so this is likely a
> kernel or filesystem bug which is exacerbated by some workload the TT does.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira