[ 
https://issues.apache.org/jira/browse/HADOOP-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12570827#action_12570827
 ] 

Olga Natkovich commented on HADOOP-2815:
----------------------------------------

One of the things that Pig does is chaining M-R jobs together. The data needed 
between the jobs is stored in the temporary files. The life of this files is 
highly dependent on the particular job that is running. It is easy to imaging 
that large jobs can be running in the order of hours and days.

$TRASH option sounds interesting. Koji, could you point me to how to use that, 
thanks.

> support for DeleteOnExit
> ------------------------
>
>                 Key: HADOOP-2815
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2815
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Olga Natkovich
>
> Pig creates temp files that it wants to be removed at the end of the 
> processing. The code that removes the temp file is in the shutdown hook so 
> that they get removed both under normal shutdown as well as when process gets 
> killed.
> The problem that we are seeing is that by the time the code is called the DFS 
> might already be closed and the delete fails leaving temp files behind. Since 
> we have no control over the shutdown order, we have no way to make sure that 
> the files get removed.
> One way to solve this issue is to be able to mark the files as temp files so 
> that hadoop can remove them during its shutdown.
> The stack trace I am seeing is
> at org.apache.hadoop.dfs.DFSClient.checkOpen(DFSClient.java:158)
>         at org.apache.hadoop.dfs.DFSClient.delete(DFSClient.java:417)
>         at 
> org.apache.hadoop.dfs.DistributedFileSystem.delete(DistributedFileSystem.java:144)
>         at 
> org.apache.pig.backend.hadoop.datastorage.HPath.delete(HPath.java:96)
>         at org.apache.pig.impl.io.FileLocalizer$1.run(FileLocalizer.java:275)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to