You can use FileInptuFormat.setInputPaths(configuration, job1-output). This will overwrite the old input path(s).
-Joey On Mon, Jan 16, 2012 at 7:16 PM, W.P. McNeill <[email protected]> wrote: > > It is possible to unset a configuration value? I think the answer is no, > but I want to be sure. > > I know that you can set a configuration value to the empty string, but I > have a scenario in which that is not an option. I have a top level Hadoop > Tool that launches a series of other Hadoop jobs in its run() method. The > output of the first sub-job becomes the input of the second one and so on. > The top-level Tool takes a configuration file which specifies parameters > used by all the sub-jobs. It also specifies a mapred.input.dir value which > serves as the input directory to the first sub-job. > > TopLevelJob() { > job1 = createJob1(configuration); > // Run job 1 > job2 = createJob2(); > FileInputFormat.addInputPath(configuration, job1-output) > // Run job 2 > } > > The problem is that addInputPath() appends a value to the end of > mapred.input.dir, erroneously leaving the input directory for Job 1 on the > list for Job 2. If I try to delete Job 1's input dir by setting > mapred.input.dir to the empty string like so: > > configuration.set("mapred.input.dir", "") > > the addInputPath() method appends the input path, giving the value > ",job1-output". The first element of this list is the empty string, which > causes an Exception. > > I can work around this by calling configuration.set("mapred.input.dir") > directly when creating Job 2, but this feels like a hack. It seems like the > proper way to set input paths is via a FileInputFormat method instead of by > setting the property directly. -- Joseph Echeverria Cloudera, Inc. 443.305.9434
