There's likely another gotcha regarding the fact that various logs and job config files are written to the _logs directory under the output directory. You'd need to uniquify that as well. There may be other traps, but I don't know them :)
This might be a bit of a frustrating endeavour since you're trying to override behaviour that's been baked in to Hadoop for a while. Why in particular do you need all your jobs to emit to a common directory? You could probably save yourself some headache by writing to subdirectories of a common dir. e.g., rather than having jobs 0..n write to /user/foo/commonoutput, just write to /user/foo/outputs/0, /user/foo/outputs/1, etc.. If you need to collect the various outputs together to use in a subsequent MR job, you can use FileInputFormat.addInputPath() multiple times on the various directories. Or you could modify other downstream logic of yours to either recursively descend a level into a hierarchy, or use FileSystem.rename() to move the files from the different directories into a single aggregate directory after all the jobs have succeeded. - Aaron On Mon, Jul 20, 2009 at 11:51 AM, Thibaut_ <[email protected]> wrote: > > Hi, > > I'm trying to run a few parallel jobs which have the same input directory > and the same output directory. > > I modified the FileInputClass to check for non zero files, and also > modified > the output class to allow non empty directories (the input directory = > output directory in my case). I made sure that each job output is unique, > thus there are no file conflicts there. > > Everything runs fine for a while, but I'm having problems with the > temporary > directory: > java.io.IOException: The temporary job-output directory > hdfs://internal1:50010/user/root/0/_temporary doesn't exist! > > I could go further down and try to make the _temporary directory job > dependent. But before I do that, I would like to know if there are other > traps/errors I could run into running parallel jobs having the same > output/input directory? > > (Btw this is hadoop-0.20.0) > > Thanks, > Thibaut > > -- > View this message in context: > http://www.nabble.com/Running-parallel-jobs-having-the-same-output-directory-tp24575402p24575402.html > Sent from the Hadoop core-user mailing list archive at Nabble.com. > >
