[email protected] [email protected]

It might be nice to add a few default flags to AbstractJob that map directly
to -D arguments in hadoop, for example, I could see having -i map to
-Dmapred.input.dir, -o to -Dmapred.output.dir, -nr -Dmapred.num.reducers
etc.. I think it is great to be able to accept arbitrary -D arguments but it
would be nice to accept shorthand which also gets displayed in -h output.

The -D options don't get included in -h and as a result it is unclear just
how to specify input or output to someone who might not be too familliar
with hadoop conventions.

>From the API perspective, AbstractJob could provide no-arg methods like
AbstractJob.buildInputOption() etc, where the class using the AbstractJob
api need not be concerned with the precise letters, parameters, description
required for the option.

Tangentially related, I was wondering something about AbstractJob: With the
advent of the parsedArgs map returned by AbstractJob.parseArguments is there
a need to pass Option arguments around anymore? Could AbstractJob maintain
Options state in a sense?

For example, from RecommenderJob:

    Option numReccomendationsOpt =
AbstractJob.buildOption("numRecommendations", "n",
      "Number of recommendations per user", "10");
    Option usersFileOpt = AbstractJob.buildOption("usersFile", "u",
      "File of users to recommend for", null);
    Option booleanDataOpt = AbstractJob.buildOption("booleanData", "b",
      "Treat input as without pref values", Boolean.FALSE.toString());

    Map<String,String> parsedArgs = AbstractJob.parseArguments(
        args, numReccomendationsOpt, usersFileOpt, booleanDataOpt);
    if (parsedArgs == null) {
      return -1;
    }

Could be changed to something like:

buildOption("numRecommendations", "n", "Number of recommendations per user",
"10");
buildOption("usersFile", "u", "File of users to recommend for", null);
buildOption("booleanData", "b", "Treat input as without pref values",
Boolean.FALSE.toString());
Map<String,String> parsedArgs = parseArguments();

Providing a set of input validators that check the input before launching a
job sounds like a pretty cool idea too.

On Fri, May 28, 2010 at 10:55 AM, Sean Owen <[email protected]> wrote:

> Does it help to note this is Hadoop's flag? It seemed more standard
> therefore,  possibly more intuitive for some already using Hadoop. We were
> starting to reinvent many flags this way so seemed better to not thunk them
> with no gain
>
> On May 28, 2010 6:06 AM, "Grant Ingersoll" <[email protected]> wrote:
>
> I just saw that too, and it seems like a loss to me.  We did a lot of work
> to be consistent on this and have a lot of documentation out there with it
> in it.  -Dmapred.input.dir is so much less intuitive than -i or --input.
>
> -Grant
>
>
> On May 27, 2010, at 9:04 PM, Jake Mannix wrote:
>
> > Is that right? I think the mahout shell script ...
>

Reply via email to