[
https://issues.apache.org/jira/browse/MAHOUT-294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888561#action_12888561
]
Jeff Eastman edited comment on MAHOUT-294 at 7/14/10 7:12 PM:
--------------------------------------------------------------
I've seen the light and attached is a patch that goes most of the way towards
being a good child of AbstractJob. It refactors the command parsing out of
main() into run() and also refactors the guts out of static runJob() into
non-static job() which does all the heavy lifting. Job() is also called from
run() so the whole thing hangs together pretty well and none of the runJob
clients were impacted.
On the DefaultOptionsCreator, I did some constant extraction and inlined any
methods that were called only by one other driver. The remaining options are
shared and i used addOption(DefaultOptionsCreator.blahOption().create()) rather
than exploding them to use the expanded addOption(...). I found a precedent for
this in CollocDriver and it seemed like a lot of busy work to refactor all the
usages of DefaultOptionsCreator so I did not do that.
I also did not use AbstractJob.prepareJob() but left the individual conf and
job initializations in place since they are working. There was also precedent
for this. Clustering has several job steps (e.g. runClustering) which do not
use reducers and prepareJob doesn't work for them.
In terms of feedback on the AbstractJob design, I find the need to create two
constants unwieldy as in:
{code}
private static final String NUM_CLUSTERS_OPTION = "numClusters";
public static final String NUM_CLUSTERS_OPTION_KEY = "--" +
NUM_CLUSTERS_OPTION;
{code}
since the options argument map is keyed by "--" prepended to the option's long
name. My preference would be to omit that but I see the code is using
Option.preferredName() which is not subject to modification.
All the unit tests run but they don't really test the command line processing.
The clustering tests use runJob() with java arguments instead. I admit to being
a bit on the lazy side in terms of some possible refactoring but think this is
pretty close to the target.
was (Author: jeastman):
I've seen the light and attached is a patch that goes most of the way
towards being a good child of AbstractJob. It refactors the command parsing out
of main() into run() and also refactors the guts out of static runJob() into
non-static job() which does all the heavy lifting. Job() is also called from
run() so the whole thing hangs together pretty well and none of the runJob
clients were impacted.
On the DefaultOptionsCreator, I did some constant extraction and inlined any
methods that were called only by one other driver. The remaining options are
shared and i used addOption(DefaultOptionsCreator.blahOption().create()) rather
than exploding them to use the expanded addOption(...). I found a precedent for
this in CollocDriver and it seemed like a lot of busy work to refactor all the
usages of DefaultOptionsCreator so I did not do that.
I also did not use AbstractJob.prepareJob() but left the individual conf and
job initializations in place since they are working. There was also precedent
for this. Clustering has several job steps (e.g. runClustering) which do not
use reducers and prepareJob doesn't work for them.
In terms of feedback on the AbstractJob design, I find the need to create two
constants unwieldy as in:
{code}
private static final String NUM_CLUSTERS_OPTION = "numClusters";
public static final Object NUM_CLUSTERS_OPTION_KEY = "--" +
NUM_CLUSTERS_OPTION;
{code}
since the options argument map is keyed by "--" prepended to the option's long
name. My preference would be to omit that but I see the code is using
Option.preferredName() which is not subject to modification.
All the unit tests run but they don't really test the command line processing.
The clustering tests use runJob() with java arguments instead. I admit to being
a bit on the lazy side in terms of some possible refactoring but think this is
pretty close to the target.
> Uniform API behavior for Jobs
> -----------------------------
>
> Key: MAHOUT-294
> URL: https://issues.apache.org/jira/browse/MAHOUT-294
> Project: Mahout
> Issue Type: Improvement
> Components: Classification, Clustering, Collaborative Filtering,
> Frequent Itemset/Association Rule Mining, Genetic Algorithms, Math, Utils
> Affects Versions: 0.4
> Reporter: Robin Anil
> Fix For: 0.4
>
> Attachments: MAHOUT-294.patch, MAHOUT-294.patch
>
>
> * Move AbstractJob to common and convert all the Driver classes to extend
> that.
> One suggestion is:
> AlgorithmParams params = ParamsBuilder.build().withParam("-i",
> input).withParam("-o", output)....
> MyAlgorithmn.runJob(params) throws ParameterMissingException;
> * Give uniform command-line parameters for various algorithms.
> e.g Currently distance measure is -d, -dm, -m at different places in
> clustering
> * Add a temp directory as a parameter
> http://www.lucidimagination.com/search/document/28a979aa62c02a1/who_owns_mahout_bucket_on_s3#ddb5855e8bdace45
> This issue will keep track of all discussion/patches related to the design
> and cleanup of Mahout API
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.