I think this is related to HADOOP-1558: https://issues.apache.org/jira/browse/HADOOP-1558
Per-job cleanups that are not run clientside must be run in a separate JVM, since we, as a rule, don't run user code in long-lived daemons.
Doug Stu Hood wrote:
Does anyone have any ideas on this issue? Otherwise, if I were to write a patch to add this option for jobs to Hadoop, would it be useful for anyone else? Thanks Stu -----Original Message----- From: Stu Hood <[EMAIL PROTECTED]> Sent: Fri, August 24, 2007 9:43 am To: [email protected] Subject: Removing files after processing Hello, Whats the best way to go about doing cleanup after MapReduce jobs? I'd like to have the job delete its input files when it has finished successfully (but preferably before it is marked as having finished: so I don't have to deal with a race condition). Obviously, I don't want to have to track which files are being processed for each job, since that data is stored anyway. Also, I'm using JobClient.submitJob(), so I can't sit around and wait to do the cleanup manually. Any suggestions? Thanks! Stu Hood Webmail.us "You manage your business. We'll manage your email."®
