[ 
https://issues.apache.org/jira/browse/HADOOP-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514412
 ] 

Arun C Murthy commented on HADOOP-1612:
---------------------------------------

Thanks for the info Christian...

bq. From this it looks as if the framework is still active with the output 
directory even after job completion.

Ah, this is entirely feasible and here is how:
The job is declared as *success* as soon as all the {{TIP}}s are complete (and 
clearly at this point all the successful sub-task files have been promoted), 
however there might still be speculative tasks of the completed {{TIP}}s still 
active at this point. Hence when they are killed (or succeed) the framework 
just discards their output files (the *_1* directories in your logs)... which 
are precisely the logs you see.

Of course I'll keep poking and appreciate any info from you...

> listing of an output directory shortly after job completion fails
> -----------------------------------------------------------------
>
>                 Key: HADOOP-1612
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1612
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.14.0
>            Reporter: Christian Kunz
>            Assignee: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.14.0
>
>
> Sometimes, after a job finishes, and another application wants to rename dfs 
> files created by that job, listing of the output directory containing the 
> newly created files fails. File creation and directory listing is done via 
> libhdfs, but it is unlikely that this makes any difference, therefore, I add 
> this to the mapred component.
> It might be a race condition: does the job complete before the files in the 
> output directory are promoted?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to