Ted may have a better one, but in my quick poking around at things http://gibbslda.sourceforge.net/ looks to be a good implementation of the Gibbs sampling approach.
-----Original Message----- From: Goel, Ankur [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 04, 2008 4:58 AM To: [email protected] Subject: RE: LDA [was RE: Taste on Mahout] Ted, Do you have a sequential version of LDA implementation that can be used for reference ? If yes, can you please post it on Jira ? Should we open a new Jira or use MAHOUT-30 for this ? -----Original Message----- From: Ted Dunning [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 27, 2008 11:50 AM To: [email protected] Subject: Re: LDA [was RE: Taste on Mahout] Chris Bishop's book has a very clear exposition of the relationship between the variational techniques and EM. Very good reading. On Mon, May 26, 2008 at 10:13 PM, Goel, Ankur <[EMAIL PROTECTED]> wrote: > Daniel/Ted, > Thanks for the interesting pointers to more information on LDA > and EM. > I am going through the docs to visualize and understand how LDA > approach would work for my specific case. > > Once I have some idea, I can volunteer to work on the Map-Reduce side > of > > thngs as this is something that will benefit both my project and the > community. > > Looking forward to share more ideas/information on this :-) > > Regards > -Ankur > > -----Original Message----- > From: Ted Dunning [mailto:[EMAIL PROTECTED] > Sent: Tuesday, May 27, 2008 6:59 AM > To: [email protected] > Subject: Re: LDA [was RE: Taste on Mahout] > > Those are both new to me. Both look interesting. My own experience > is that the simplicity of the Gibb's sampling makes it very much more > attractive for implementation. Also, since it is (nearly) trivially > parallelizable, it is more likely we will get a useful implementation > right off the bat. > > On Mon, May 26, 2008 at 5:49 PM, Daniel Kluesing > <[EMAIL PROTECTED]> > wrote: > > > (Hijacking the thread to discuss ways to implement LDA) > > > > Had you seen > > http://books.nips.cc/papers/files/nips20/NIPS2007_0672.pdf > > ? > > > > Their hierarchical distributed LDA formulation uses gibbs sampling > > and > > > fits into mapreduce. > > > > http://www.cs.berkeley.edu/~jawolfe/pubs/08-icml-em.pdf<http://www.c > > s.berkeley.edu/%7Ejawolfe/pubs/08-icml-em.pdf> > <http://www.cs. > > berkeley.edu/%7Ejawolfe/pubs/08-icml-em.pdf>gives a mapreduce > formulation for the variational EM method. > > > > I'm still chewing on them, but my first impression is that the EM > > approach would give better performance on bigger data sets. Opposing > > views welcome. > > > > > -- ted
