[
https://issues.apache.org/jira/browse/HADOOP-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12571616#action_12571616
]
Olga Natkovich commented on HADOOP-2815:
----------------------------------------
That would be ideal but I am afraid I can't wait for that to be added since
with Hadoop 0.16 we seem to be leaving data behind quite frequently. Seems like
the best option for me for now to do my own thing but I would definitely
transition to the supported API as soon as it becomes available.
Also, perhaps, instead of extending trash which was build for different use and
might or might not be enabled on the cluster, a separate mechanism can be built
for storing temp files that is always present and also where files and
directories can be marked as deleteOnExit. Pig mainly uses temp files to store
data between map and reduce jobs and anybody who runs chains of those would
benefit. The other important use case would be to checkpoint which we also
considering adding to pig. Let me know if you would like for me to open a
separate JIRA on that or change this one.
> support for DeleteOnExit
> ------------------------
>
> Key: HADOOP-2815
> URL: https://issues.apache.org/jira/browse/HADOOP-2815
> Project: Hadoop Core
> Issue Type: New Feature
> Components: dfs
> Reporter: Olga Natkovich
>
> Pig creates temp files that it wants to be removed at the end of the
> processing. The code that removes the temp file is in the shutdown hook so
> that they get removed both under normal shutdown as well as when process gets
> killed.
> The problem that we are seeing is that by the time the code is called the DFS
> might already be closed and the delete fails leaving temp files behind. Since
> we have no control over the shutdown order, we have no way to make sure that
> the files get removed.
> One way to solve this issue is to be able to mark the files as temp files so
> that hadoop can remove them during its shutdown.
> The stack trace I am seeing is
> at org.apache.hadoop.dfs.DFSClient.checkOpen(DFSClient.java:158)
> at org.apache.hadoop.dfs.DFSClient.delete(DFSClient.java:417)
> at
> org.apache.hadoop.dfs.DistributedFileSystem.delete(DistributedFileSystem.java:144)
> at
> org.apache.pig.backend.hadoop.datastorage.HPath.delete(HPath.java:96)
> at org.apache.pig.impl.io.FileLocalizer$1.run(FileLocalizer.java:275)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.