[
https://issues.apache.org/jira/browse/MAPREDUCE-4770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jaikannan Ramamoorthy updated MAPREDUCE-4770:
---------------------------------------------
Affects Version/s: 0.20.203.0
> Hadoop jobs failing with FileNotFound Exception while the job is still running
> ------------------------------------------------------------------------------
>
> Key: MAPREDUCE-4770
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4770
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: tasktracker
> Affects Versions: 0.20.203.0
> Reporter: Jaikannan Ramamoorthy
>
> We are having a strange issue in our Hadoop cluster. We have noticed that
> some of our jobs fail with the with a file not found exception[see below].
> Basically the files in the "attempt_*" directory and the directory itself are
> getting deleted while the task is still being run on the host. Looking
> through some of the hadoop documentation I see that the job directory gets
> wiped out when it gets a KillJobAction however I am not sure why it gets
> wiped out while the job is still running.
> My question is what could be deleting it while the job is running? Any
> thoughts or pointers on how to debug this would be helpful.
> Thanks!
> java.io.FileNotFoundException:
> /hadoop/mapred/local_data/taskTracker//jobcache/job_201211030344_15383/attempt_201211030344_15383_m_000169_0/output/spill29.out
> (Permission denied) at java.io.FileInputStream.open(Native Method) at
> java.io.FileInputStream.(FileInputStream.java:120) at
> org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.(RawLocalFileSystem.java:71)
> at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.(RawLocalFileSystem.java:107)
> at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:177)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:400) at
> org.apache.hadoop.mapred.Merger$Segment.init(Merger.java:205) at
> org.apache.hadoop.mapred.Merger$Segment.access$100(Merger.java:165) at
> org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:418) at
> org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381) at
> org.apache.hadoop.mapred.Merger.merge(Merger.java:77) at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1692)
> at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1322)
> at
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:698)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at
> org.apache.hadoop.mapred.Child$4.run(Child.java:259) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:396) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> at org.apache.hadoop.mapred.Child.main(Child.java:253)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira