Ted Dunning wrote:
This could also be caused if the prior is very diffuse.  This makes the
probability that a point will go to any new cluster quite low.  You can
compensate somewhat for this with different values of alpha.
Could you elaborate more on the function of alpha in the algorithm? Looking at the current implementation, it is only used to initialize the totalCount values (to alpha/k) when sampling from the prior. AFAICT it is not used anywhere else. Its current role is pretty minimal and I wonder if something fell through the cracks during all of the refactoring from the R prototype.
I have had some half thoughts about how to improve the mixing and currently
think that starting conditions may be the trick.  Using something like
k-means++ to initialize the clusters might help enormously.
If it helps k-means it likely would help Dirichlet too. Currently all the prior sampling is done by model distributions with no knowledge of the dataset via random processes. I looked for a patch for MAHOUT-153 but did not see one yet.

Reply via email to