[
https://issues.apache.org/jira/browse/HADOOP-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514363
]
Christian Kunz commented on HADOOP-1612:
----------------------------------------
In many of our applications we use the PIPES interface and create multiple
files using libhdfs in the output directory of the task. We rely on the
framework to promote the output directory of the winning task among
speculatively executed tasks, and to kill and clean the losing tasks completely.
Failure to list files in the output directory is just one of several symptoms
of this issue. We occasionally lose files after moving them from the output
directory to some other destination, or we still find files in temporary task
subdirectories of the output directory. E.g we see log entries like the
following (happened 17 seconds after job completion, this task was a killed
speculative task).
NOTICE(06:58:08,3086960320): Rename
<outputDir>/_task_0120_r_000004_1/4/<filename> to <destinationFile>
I also see log entries on the NameNode several seconds after job completion
looking like
2007-07-19 23:57:53,791 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.allocateBlock: <outputDir>/_task_0120_r_000068_1/68/<filename>
blk_5152182437135923221 is created and added to pendingCreates and
pendingCreateBlocks
>From this it looks as if the framework is still active with the output
>directory even after job completion.
> listing of an output directory shortly after job completion fails
> -----------------------------------------------------------------
>
> Key: HADOOP-1612
> URL: https://issues.apache.org/jira/browse/HADOOP-1612
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.14.0
> Reporter: Christian Kunz
> Assignee: Arun C Murthy
> Priority: Blocker
> Fix For: 0.14.0
>
>
> Sometimes, after a job finishes, and another application wants to rename dfs
> files created by that job, listing of the output directory containing the
> newly created files fails. File creation and directory listing is done via
> libhdfs, but it is unlikely that this makes any difference, therefore, I add
> this to the mapred component.
> It might be a race condition: does the job complete before the files in the
> output directory are promoted?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.