On 4/7/08 8:29 AM, "Isabel Drost (JIRA)" <[EMAIL PROTECTED]> wrote:
>> An important adjustment is to put a reasonable prior on the distributions
>> being mixed. ...
>
> I think you could impose the same restriction to EM as well?
Absolutely. This changes the EM algorithm from an MLE (maximum likelihood
estimator) method to an MAP (maximum a posteriori) method and should help
with a variety of problems.
>> EM clustering can also be changed very slightly by assigning points to
>> single clusters chosen at random according to the probability of
>> membership. This turns EM clustering into Gibb's sampling.
>
> That is the simplest explanation of Gibb's sampling I have read so far :)
Of *one*kind* of Gibb's sampling anyway. :)
>> Further extension can also be made by assuming an infinite mixture model.
>> The implementation is only slightly more difficult and the result is a
>> (nearly) non-parametric clustering algorithm. I will attach an R
>> implementation for reference.
>
> I think the dirichlet process based clustering comes with the handy property
> that you can avoid passing the number of parameters into the algorithm, right?
Yes. You still pass some parameters in via the prior and there is one
parameter that expressive how aggressive the algorithm is about thinking
there is another cluster it doesn't yet know about.
> To me that seems better for realistic settings where you usually have some
> data available but you cannot tell how many clusters are there.
And it is even better when the number of clusters is ambiguous. In those
cases, getting a cloud of clusterings is very helpful because you can answer
some questions without ambiguity and some with ambiguity according to what
the data shows.
> Do you think, one can solve the original PLSI problem with Gibb's sampling or
> an infinite mixture model as well?
Yes. Gibb's sampling works fine for latent dirichlet allocation (a sub-form
of discrete component allocation). Variational techniques also work, but
are more difficult to explain. The jury is still out on how well Gibb's
sampling works for infinite mixtures with text, but there are some very
exciting early developments.
See DCA and LDA and the later adaptations to non-parametric form in the
following references.
http://cosco.hiit.fi/Articles/buntineBohinj.pdf
http://www.cs.princeton.edu/~blei/papers/BleiNgJordan2003.pdf
http://citeseer.ist.psu.edu/756482.html
http://citeseer.ist.psu.edu/736109.html
http://citeseer.ist.psu.edu/756901.html
> After all the original patch was about
> integrating PLSI that is based on EM. I wonder whether one should split this
> thread into at least four threads:
> - EM implementation
> - Gibb's sampling implementation
> - dirichlet process implementation
> - PLSI based on EM
>
> What do you think?
+10.
The first and last might be the same unless somebody is interested in doing
a general purpose EM implementation.