That's fine, clustering should be included in all the rest of the job
consistency work too. On LDA at least, if you look at the driver its
taking the -1 default from the options builder and setting topic
smoothing to 50/numTopics. Can't really pass that default into the
options builder since it has not yet read the other options. Good catch
on -k though, for Dirichlet it is required. I'll change the option to
.withRequired(false) and add .withRequired(true) in the Dirichlet jobs
which do require it.
In general, since different algorithms have different required options,
perhaps it would be best to have the DefaultOptionCreator not set this
for any options and do the required/optional determination in the
various drivers.
On 5/22/10 6:31 PM, Robin Anil (JIRA) wrote:
[
https://issues.apache.org/jira/browse/MAHOUT-294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robin Anil updated MAHOUT-294:
------------------------------
Component/s: Clustering
Adding clustering back. Saw some bugs
KMeans put the -k parameter as required=true. So It was overwriting centroids
even when not specified, instead of reading it
LDA: Topic smoothing was changed to default of -1 (it should be 50/numTopics)
Uniform API behavior for Jobs
-----------------------------
Key: MAHOUT-294
URL: https://issues.apache.org/jira/browse/MAHOUT-294
Project: Mahout
Issue Type: Improvement
Components: Classification, Clustering, Collaborative Filtering,
Frequent Itemset/Association Rule Mining, Genetic Algorithms, Math, Utils
Affects Versions: 0.4
Reporter: Robin Anil
Fix For: 0.4
* Move AbstractJob to common and convert all the Driver classes to extend that.
One suggestion is:
AlgorithmParams params = ParamsBuilder.build().withParam("-i",
input).withParam("-o", output)....
MyAlgorithmn.runJob(params) throws ParameterMissingException;
* Give uniform command-line parameters for various algorithms.
e.g Currently distance measure is -d, -dm, -m at different places in
clustering
* Add a temp directory as a parameter
http://www.lucidimagination.com/search/document/28a979aa62c02a1/who_owns_mahout_bucket_on_s3#ddb5855e8bdace45
This issue will keep track of all discussion/patches related to the design and
cleanup of Mahout API