Hadoop Streaming has a -input and -output option for specifying input and output directory or file patterns but I believe GenericOptionsParser (which is used via the ToolRunner, for plain Java MapReduce programs), so far, does not support those two options with names.
Instead, you will have to specify them with the -D way and ensure that the property you're setting for it is the right one for the release you're on. [mapred.input.dir is deprecated in favor of mapreduce.input.fileinputformat.inputdir in the future versions, for example]. Or you could write an additional opts parser and let it handle such arguments (-input/-output, like streaming has in it) after the ToolRunner is done parsing its accepted ones. On Thu, Jan 27, 2011 at 12:23 AM, W.P. McNeill <[email protected]> wrote: > I like to pass positional arguments to Hadoop processes where all but the > last argument is an input directory and the last argument is an output > directory. It seems like there are a couple ways to integrate this with > specifying these directories with -D mapred.input.dir and -D > mapred.output.dir. Is there an accepted standard way to specify Hadoop > input and output directories? > -- Harsh J www.harshj.com
