Hey Jeff, your 0.20 upgrade patch is hanging. Any idea on when Hadoop 0.21
will release. The freeze date vote was called in June-July 2009. Anyone know
what's happening with Hadoop? This waiting kinda sucks if you ask me.


On Sun, May 23, 2010 at 8:32 PM, Jeff Eastman <[email protected]>wrote:

> That's fine, clustering should be included in all the rest of the job
> consistency work too. On LDA at least, if you look at the driver its taking
> the -1 default from the options builder and setting topic smoothing to
> 50/numTopics. Can't really pass that default into the options builder since
> it has not yet read the other options. Good catch on -k though, for
> Dirichlet it is required. I'll change the option to .withRequired(false) and
> add .withRequired(true) in the Dirichlet jobs which do require it.
>
> In general, since different algorithms have different required options,
> perhaps it would be best to have the DefaultOptionCreator not set this for
> any options and do the required/optional determination in the various
> drivers.
>
>
> On 5/22/10 6:31 PM, Robin Anil (JIRA) wrote:
>
>>      [
>> https://issues.apache.org/jira/browse/MAHOUT-294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>>
>> Robin Anil updated MAHOUT-294:
>> ------------------------------
>>
>>     Component/s: Clustering
>>
>> Adding clustering back. Saw some bugs
>>
>> KMeans put the -k parameter as required=true. So It was overwriting
>> centroids even when not specified, instead of reading it
>> LDA: Topic smoothing was changed to default of -1 (it should be
>> 50/numTopics)
>>
>>
>>
>>> Uniform API behavior for Jobs
>>> -----------------------------
>>>
>>>                 Key: MAHOUT-294
>>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-294
>>>             Project: Mahout
>>>          Issue Type: Improvement
>>>          Components: Classification, Clustering, Collaborative Filtering,
>>> Frequent Itemset/Association Rule Mining, Genetic Algorithms, Math, Utils
>>>    Affects Versions: 0.4
>>>            Reporter: Robin Anil
>>>             Fix For: 0.4
>>>
>>>
>>> * Move AbstractJob to common and convert all the Driver classes to extend
>>> that.
>>>    One suggestion is:
>>>    AlgorithmParams params = ParamsBuilder.build().withParam("-i",
>>> input).withParam("-o", output)....
>>>    MyAlgorithmn.runJob(params) throws ParameterMissingException;
>>> * Give uniform command-line parameters for various algorithms.
>>>    e.g Currently distance measure is -d, -dm, -m at different places in
>>> clustering
>>> * Add a temp directory as a parameter
>>> http://www.lucidimagination.com/search/document/28a979aa62c02a1/who_owns_mahout_bucket_on_s3#ddb5855e8bdace45
>>> This issue will keep track of all discussion/patches related to the
>>> design and cleanup of Mahout API
>>>
>>>
>>
>>
>
>

Reply via email to