[ 
https://issues.apache.org/jira/browse/MAHOUT-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646967#action_12646967
 ] 

Ted Dunning commented on MAHOUT-30:
-----------------------------------

Jeff,

These look like really nice refactorings.  The process is nice and clear.

The only key trick that may confuse people is that each step is a sampling.  
Thus assignment to clusters does NOT assign to the best cluster, it picks a 
cluster at random, biased by the mixture parameters and model pdf's.  Likewise, 
model computation does NOT compute the best model, it samples from the 
distribution given by the data.  Same is true for the mixture parameters.

Your code does this.  I just think that this is a hard point for people to 
understand in these techniques. 

> dirichlet process implementation
> --------------------------------
>
>                 Key: MAHOUT-30
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-30
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Clustering
>            Reporter: Isabel Drost
>         Attachments: MAHOUT-30.patch
>
>
> Copied over from original issue:
> > Further extension can also be made by assuming an infinite mixture model. 
> > The implementation is only slightly more difficult and the result is a 
> > (nearly)
> > non-parametric clustering algorithm.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to