You probably know more about this than I, but conceptually, all the
params for a Hadoop job are in a Configuration. This includes
command-line Hadoop args and anything else the code tosses in.

One way or another the Job has to get such a Configuration. ToolRunner
takes care of this in the command line case. A Java caller would have
to also find a way to construct a Configuration.

That says to me that something somewhere takes a Configuration as
input in order to run the job. And that something can be called from a
main() method plus ToolRunner for the command line case, or directly
as a Java method.

I think that's about the right idea in theory, don't know how it
meshes with practice?


No the static method thing is something Findbugs and PMD flag, as well
as IntelliJ. It's not always true that a method should be static
because it can (for example if you design for inheritance) but I had
thought this was not the nature of a Driver.

On Wed, Sep 22, 2010 at 7:32 PM, Jeff Eastman
<[email protected]> wrote:
>  Exactly my point. Current clustering and a lot of other drivers don't call
> ToolRunner in their main method; they do new Driver().run(). This needs to
> be changed everywhere. The job() methods currently create new Configuration
> objects since they are invoked mostly from Java in unit tests and layered
> jobs (e.g. synthetic control). I've got a version of Canopy that does call
> ToolRunner and it does return a populated Configuration from getConf() but,
> since the job methods are now static, they can't call it; it needs to be an
> explicit argument. So, I've added conf as the first parameter to job() (and
> left a convenience version without it), and that seems to work.
>
> Now I'm trying to use a -D argument to set a configuration parameter but the
> parser won't accept it. I've tried -D foo.bar.baz=11 and -Dfoo.bar.baz=11
> with no joy on either. What is the correct syntax?
>
> On the separate question of explicit numReducers arguments to the Java
> methods and the CLI I'm all for doing it consistently. It's more work for
> Java callers to create and set the conf parameter than it is with an
> explicit argument but most current callers would use the convenience method
> anyway.
>
> On the static conversions themselves, new Foo().run() is how they used to do
> it but, as you noted earlier, it should be ToolRunner.run(class, conf, args)
> anyway. Since run() *is* an instance method it seemed more correct to have
> the methods it called also be instance methods. In clustering, the methods
> used to be static when I wrote them so I can't claim to be an OO purist,
> though I still don't like them. Just trying to sort out the motivation for
> the change: was this PMD, Checkstyle, or Seanstyle <g>?
>
> On 9/22/10 1:53 PM, Sean Owen wrote:
>>
>> Let me try
>>
>> On Wed, Sep 22, 2010 at 3:32 PM, Jeff Eastman
>> <[email protected]>  wrote:
>>>
>>>  The clustering drivers all call new Configuration() in their
>>> implementations. When run only from the CLI, other Mahout jobs call
>>> getConf() which is where the -D arguments get pulled in (right?). So
>>> there
>>
>> This comes from using ToolRunner.run(). It sets up all those args, and
>> then calls Tool.run(). So when you implement Tool, in run(), the
>> result of getConf() has all that stuff.
>>
>> Inside, it's org.apache.hadoop.util.GenericOptionsParser that does that
>> work.
>>
>> I think your point is that this doesn't hold up for the case of
>> invoking from some arbitrary Java calling code. Yes, in that case, the
>> caller might have to populate a Configuration object (or be able to
>> modify it) to pass this sort of setting. At least that's how I'd play
>> it.
>>
>> But then the question of adding a new command-line argument doesn't
>> help this use case anyway.
>>
>> Am I following?
>>
>>
>>> And what was the PMD/Checkstyle problem with instance methods on the
>>> drivers
>>> that motivated the regression to statics? I hate statics.
>>
>> The reasoning was simply that the methods used no instance methods or
>> members. It was already "really" a static method.
>>
>> I have little problem with the hard-line OO approach that even such
>> Driver classes ought to be full of instance methods anyway, and
>> perhaps have this bit of glue to the non-object-oriented world at the
>> end:
>>
>> public static void main(String[] args) {
>>   new Foo().doIt();
>> }
>>
>> ... but I guess I'm saying it did not seem to be written that way?
>> Things were passed around as method args when they could otherwise be
>> instance members. So it looked like the intent was a static method
>> anyhow.
>>
>

Reply via email to