It is possible to unset a configuration value? I think the answer is no,
but I want to be sure.
I know that you can set a configuration value to the empty string, but I
have a scenario in which that is not an option. I have a top level Hadoop
Tool that launches a series of other Hadoop jobs in its run() method. The
output of the first sub-job becomes the input of the second one and so on.
The top-level Tool takes a configuration file which specifies parameters
used by all the sub-jobs. It also specifies a mapred.input.dir value which
serves as the input directory to the first sub-job.
TopLevelJob() {
job1 = createJob1(configuration);
// Run job 1
job2 = createJob2();
FileInputFormat.addInputPath(configuration, job1-output)
// Run job 2
}
The problem is that addInputPath() appends a value to the end of
mapred.input.dir, erroneously leaving the input directory for Job 1 on the
list for Job 2. If I try to delete Job 1's input dir by setting
mapred.input.dir to the empty string like so:
configuration.set("mapred.input.dir", "")
the addInputPath() method appends the input path, giving the value
",job1-output". The first element of this list is the empty string, which
causes an Exception.
I can work around this by calling configuration.set("mapred.input.dir")
directly when creating Job 2, but this feels like a hack. It seems like the
proper way to set input paths is via a FileInputFormat method instead of by
setting the property directly.