[ 
https://issues.apache.org/jira/browse/MAHOUT-294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888561#action_12888561
 ] 

Jeff Eastman edited comment on MAHOUT-294 at 7/14/10 7:12 PM:
--------------------------------------------------------------

I've seen the light and attached is a patch that goes most of the way towards 
being a good child of AbstractJob. It refactors the command parsing out of 
main() into run() and also refactors the guts out of static runJob() into 
non-static job() which does all the heavy lifting. Job() is also called from 
run() so the whole thing hangs together pretty well and none of the runJob 
clients were impacted.

On the DefaultOptionsCreator, I did some constant extraction and inlined any 
methods that were called only by one other driver. The remaining options are 
shared and i used addOption(DefaultOptionsCreator.blahOption().create()) rather 
than exploding them to use the expanded addOption(...). I found a precedent for 
this in CollocDriver and it seemed like a lot of busy work to refactor all the 
usages of DefaultOptionsCreator so I did not do that.

I also did not use AbstractJob.prepareJob() but left the individual conf and 
job initializations in place since they are working. There was also precedent 
for this. Clustering has several job steps (e.g. runClustering)  which do not 
use reducers and prepareJob doesn't work for them.

In terms of feedback on the AbstractJob design, I find the need to create two 
constants unwieldy as in:
{code}
  private static final String NUM_CLUSTERS_OPTION = "numClusters";
  public static final String NUM_CLUSTERS_OPTION_KEY = "--" + 
NUM_CLUSTERS_OPTION;
{code}

since the options argument map is keyed by "--" prepended to the option's long 
name. My preference would be to omit that but I see the code is using 
Option.preferredName() which is not subject to modification.

All the unit tests run but they don't really test the command line processing. 
The clustering tests use runJob() with java arguments instead. I admit to being 
a bit on the lazy side in terms of some possible refactoring but think this is 
pretty close to the target. 


      was (Author: jeastman):
    I've seen the light and attached is a patch that goes most of the way 
towards being a good child of AbstractJob. It refactors the command parsing out 
of main() into run() and also refactors the guts out of static runJob() into 
non-static job() which does all the heavy lifting. Job() is also called from 
run() so the whole thing hangs together pretty well and none of the runJob 
clients were impacted.

On the DefaultOptionsCreator, I did some constant extraction and inlined any 
methods that were called only by one other driver. The remaining options are 
shared and i used addOption(DefaultOptionsCreator.blahOption().create()) rather 
than exploding them to use the expanded addOption(...). I found a precedent for 
this in CollocDriver and it seemed like a lot of busy work to refactor all the 
usages of DefaultOptionsCreator so I did not do that.

I also did not use AbstractJob.prepareJob() but left the individual conf and 
job initializations in place since they are working. There was also precedent 
for this. Clustering has several job steps (e.g. runClustering)  which do not 
use reducers and prepareJob doesn't work for them.

In terms of feedback on the AbstractJob design, I find the need to create two 
constants unwieldy as in:
{code}
  private static final String NUM_CLUSTERS_OPTION = "numClusters";
  public static final Object NUM_CLUSTERS_OPTION_KEY = "--" + 
NUM_CLUSTERS_OPTION;
{code}

since the options argument map is keyed by "--" prepended to the option's long 
name. My preference would be to omit that but I see the code is using 
Option.preferredName() which is not subject to modification.

All the unit tests run but they don't really test the command line processing. 
The clustering tests use runJob() with java arguments instead. I admit to being 
a bit on the lazy side in terms of some possible refactoring but think this is 
pretty close to the target. 

  
> Uniform API behavior for Jobs
> -----------------------------
>
>                 Key: MAHOUT-294
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-294
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification, Clustering, Collaborative Filtering, 
> Frequent Itemset/Association Rule Mining, Genetic Algorithms, Math, Utils
>    Affects Versions: 0.4
>            Reporter: Robin Anil
>             Fix For: 0.4
>
>         Attachments: MAHOUT-294.patch, MAHOUT-294.patch
>
>
> * Move AbstractJob to common and convert all the Driver classes to extend 
> that.
>    One suggestion is:
>    AlgorithmParams params = ParamsBuilder.build().withParam("-i", 
> input).withParam("-o", output)....
>    MyAlgorithmn.runJob(params) throws ParameterMissingException;
> * Give uniform command-line parameters for various algorithms.
>    e.g Currently distance measure is -d, -dm, -m at different places in 
> clustering
> * Add a temp directory as a parameter 
> http://www.lucidimagination.com/search/document/28a979aa62c02a1/who_owns_mahout_bucket_on_s3#ddb5855e8bdace45
> This issue will keep track of all discussion/patches related to the design 
> and cleanup of Mahout API

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to