Hey Jeff, your 0.20 upgrade patch is hanging. Any idea on when Hadoop 0.21 will release. The freeze date vote was called in June-July 2009. Anyone know what's happening with Hadoop? This waiting kinda sucks if you ask me.
On Sun, May 23, 2010 at 8:32 PM, Jeff Eastman <[email protected]>wrote: > That's fine, clustering should be included in all the rest of the job > consistency work too. On LDA at least, if you look at the driver its taking > the -1 default from the options builder and setting topic smoothing to > 50/numTopics. Can't really pass that default into the options builder since > it has not yet read the other options. Good catch on -k though, for > Dirichlet it is required. I'll change the option to .withRequired(false) and > add .withRequired(true) in the Dirichlet jobs which do require it. > > In general, since different algorithms have different required options, > perhaps it would be best to have the DefaultOptionCreator not set this for > any options and do the required/optional determination in the various > drivers. > > > On 5/22/10 6:31 PM, Robin Anil (JIRA) wrote: > >> [ >> https://issues.apache.org/jira/browse/MAHOUT-294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] >> >> Robin Anil updated MAHOUT-294: >> ------------------------------ >> >> Component/s: Clustering >> >> Adding clustering back. Saw some bugs >> >> KMeans put the -k parameter as required=true. So It was overwriting >> centroids even when not specified, instead of reading it >> LDA: Topic smoothing was changed to default of -1 (it should be >> 50/numTopics) >> >> >> >>> Uniform API behavior for Jobs >>> ----------------------------- >>> >>> Key: MAHOUT-294 >>> URL: https://issues.apache.org/jira/browse/MAHOUT-294 >>> Project: Mahout >>> Issue Type: Improvement >>> Components: Classification, Clustering, Collaborative Filtering, >>> Frequent Itemset/Association Rule Mining, Genetic Algorithms, Math, Utils >>> Affects Versions: 0.4 >>> Reporter: Robin Anil >>> Fix For: 0.4 >>> >>> >>> * Move AbstractJob to common and convert all the Driver classes to extend >>> that. >>> One suggestion is: >>> AlgorithmParams params = ParamsBuilder.build().withParam("-i", >>> input).withParam("-o", output).... >>> MyAlgorithmn.runJob(params) throws ParameterMissingException; >>> * Give uniform command-line parameters for various algorithms. >>> e.g Currently distance measure is -d, -dm, -m at different places in >>> clustering >>> * Add a temp directory as a parameter >>> http://www.lucidimagination.com/search/document/28a979aa62c02a1/who_owns_mahout_bucket_on_s3#ddb5855e8bdace45 >>> This issue will keep track of all discussion/patches related to the >>> design and cleanup of Mahout API >>> >>> >> >> > >
