[ 
https://issues.apache.org/jira/browse/MAHOUT-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647880#action_12647880
 ] 

Jeff Eastman commented on MAHOUT-30:
------------------------------------

The above patch makes several improvements to the above:
* refactored state updating and cluster sampling into DirichletState
* refactored creating list of models into ModelDistribution
* refactored state parameters from DirichletCluster to DirichletState
* refactored count into the model
* changed list<Model> to Model[]
* added significance filtering to print out
* increased number of iterations to 30 to demonstrate better convergence

The algorithm now produces the following output when run over 10,000 points:
* Using fixed random seed for repeatability.
* testDirichletCluster10000
* Generating 4000 samples m=[1.0, 1.0] sd=3.0
* Generating 3000 samples m=[1.0, 0.0] sd=0.1
* Generating 3000 samples m=[0.0, 1.0] sd=0.1
* sample[0]= normal(n=4037 m=[0.80, 0.73] sd=1.40), normal(n=3844 m=[0.51, 
0.51] sd=0.68), normal(n=1092 m=[0.51, 0.47] sd=0.53), normal(n=794 m=[1.26, 
1.60] sd=2.22), 
* sample[1]= normal(n=4562 m=[0.72, 0.68] sd=1.25), normal(n=2992 m=[0.48, 
0.52] sd=0.58), normal(n=1022 m=[0.67, 0.31] sd=0.53), normal(n=1227 m=[1.17, 
1.41] sd=2.13), 
* sample[2]= normal(n=4377 m=[0.66, 0.61] sd=1.08), normal(n=2592 m=[0.28, 
0.71] sd=0.51), normal(n=1057 m=[1.04, -0.06] sd=0.25), normal(n=1831 m=[1.15, 
1.26] sd=2.05), 
* sample[3]= normal(n=4302 m=[0.74, 0.36] sd=0.80), normal(n=2075 m=[-0.00, 
1.01] sd=0.32), normal(n=793 m=[1.04, -0.05] sd=0.20), normal(n=2694 m=[1.04, 
1.17] sd=1.93), 
* sample[4]= normal(n=3602 m=[0.80, 0.21] sd=0.58), normal(n=1923 m=[-0.05, 
1.05] sd=0.26), normal(n=621 m=[1.03, -0.06] sd=0.19), normal(n=3677 m=[0.94, 
1.09] sd=1.77), 


> dirichlet process implementation
> --------------------------------
>
>                 Key: MAHOUT-30
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-30
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Clustering
>            Reporter: Isabel Drost
>            Assignee: Jeff Eastman
>         Attachments: MAHOUT-30.patch, MAHOUT-30b.patch
>
>
> Copied over from original issue:
> > Further extension can also be made by assuming an infinite mixture model. 
> > The implementation is only slightly more difficult and the result is a 
> > (nearly)
> > non-parametric clustering algorithm.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to