[
https://issues.apache.org/jira/browse/PIG-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13666738#comment-13666738
]
Bob Freitas commented on PIG-3338:
----------------------------------
The general approach of having a common entry point, which creates the temp
file and adds a reference to a data structure stored in thread local is very
sound. The specific data structure does not matter all that much, since the
access to it will be pretty much sequential upon completion/termination anyway.
This approach can work equally well whether its HDFS or the local file system.
If the challenge is HDFS vs local file system and keeping them separated, then
that is pretty easy to resolve with two methods or a single one with a
discriminator with two data structures backing it.
> Temp files not deleted when run in Web App
> ------------------------------------------
>
> Key: PIG-3338
> URL: https://issues.apache.org/jira/browse/PIG-3338
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.11.1
> Environment: Linux CentOS 6.3, with Hadoop 1.0.4 and Jetty 6
> Reporter: Bob Freitas
> Priority: Critical
>
> We are executing Pig, via PigRunner, in a web app, but the temporary files
> being created by Pig are not being deleted until the web app is shutdown.
> This causes the /tmp directory to become very cluttered very quickly, and run
> the risk of filling it up over time.
> The work started in FileLocalizer with the deleteTempFiles() is an excellent
> start, but it does not go far enough. If it was fully implemented then we
> could use that method to delete those temp file after each Pig job has
> completed.
> What needs to happen is that the creation of temp files needs to always be
> passed thru the FileLocalizer.getTemporaryPath() instead of using the
> File.createTempFile() as it is now.
> The places this needs to be implemented:
> 1) FileLocalizer #706 creation of localTempDir
> 2) JobControlCompiler.getJob() #512 creation of job jar
> 3) DefaultAbstractBag #388 creation of pigbag for spills
> If these temp files are then stored into the already existing
> ThreadLocal<Deque<ElementDescriptor>>() then we could use the
> FileLocalizer.deleteTempFiles() to clean up after each Pig job and not need
> to restart the web app.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira