[ 
https://issues.apache.org/jira/browse/MAHOUT-294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Eastman updated MAHOUT-294:
--------------------------------

    Attachment: MAHOUT-294a.patch

Here's a stab at improving the testability of AbstractJob options parsing. It 
adds an argMap variable in AbstractJob and adds new getOption() and hasOption() 
methods which encapsulate the "--" prepending, avoiding additional constants. 
By factoring out ClusterDumper.addOptions() as a public method it allows unit 
testing of the command line processing without invoking the cluster dumper. We 
could require this in all subclasses by adding AbstractJob.run() and calling a 
new abstract addOptions() from it. That will have broad impact on all drivers 
and I have not done it in this patch.

As a further step, one could imagine moving all of the common options from 
DefaultOptionCreator into AbstractJob. This would have all of the Mahout shared 
command line options in a single place; improving consistency.

Comments on this approach are welcome. I'm gone for the weekend.

> Uniform API behavior for Jobs
> -----------------------------
>
>                 Key: MAHOUT-294
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-294
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification, Clustering, Collaborative Filtering, 
> Frequent Itemset/Association Rule Mining, Genetic Algorithms, Math, Utils
>    Affects Versions: 0.4
>            Reporter: Robin Anil
>             Fix For: 0.4
>
>         Attachments: MAHOUT-294.patch, MAHOUT-294.patch, MAHOUT-294a.patch
>
>
> * Move AbstractJob to common and convert all the Driver classes to extend 
> that.
>    One suggestion is:
>    AlgorithmParams params = ParamsBuilder.build().withParam("-i", 
> input).withParam("-o", output)....
>    MyAlgorithmn.runJob(params) throws ParameterMissingException;
> * Give uniform command-line parameters for various algorithms.
>    e.g Currently distance measure is -d, -dm, -m at different places in 
> clustering
> * Add a temp directory as a parameter 
> http://www.lucidimagination.com/search/document/28a979aa62c02a1/who_owns_mahout_bucket_on_s3#ddb5855e8bdace45
> This issue will keep track of all discussion/patches related to the design 
> and cleanup of Mahout API

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to