RE: LDA [was RE: Taste on Mahout]

Daniel Kluesing Wed, 04 Jun 2008 08:02:10 -0700

Ted may have a better one, but in my quick poking around at things
http://gibbslda.sourceforge.net/ looks to be a good implementation of
the Gibbs sampling approach.


-----Original Message-----
From: Goel, Ankur [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, June 04, 2008 4:58 AM
To: [email protected]
Subject: RE: LDA [was RE: Taste on Mahout]

Ted, Do you have a sequential version of LDA implementation that can be
used for reference ?
If yes, can you please post it on Jira ? Should we open a new Jira or
use MAHOUT-30 for this ?

-----Original Message-----
From: Ted Dunning [mailto:[EMAIL PROTECTED]
Sent: Tuesday, May 27, 2008 11:50 AM
To: [email protected]
Subject: Re: LDA [was RE: Taste on Mahout]

Chris Bishop's book has a very clear exposition of the relationship
between the variational techniques and EM.  Very good reading.

On Mon, May 26, 2008 at 10:13 PM, Goel, Ankur <[EMAIL PROTECTED]>
wrote:

> Daniel/Ted,
>      Thanks for the interesting pointers to more information on LDA 
> and EM.
> I am going through the docs to visualize and understand how LDA 
> approach would work for my specific case.
>
> Once I have some idea, I can volunteer to work on the Map-Reduce side 
> of
>
> thngs as this is something that will benefit both my project and the 
> community.
>
> Looking forward to share more ideas/information on this :-)
>
> Regards
> -Ankur
>
> -----Original Message-----
> From: Ted Dunning [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, May 27, 2008 6:59 AM
> To: [email protected]
> Subject: Re: LDA [was RE: Taste on Mahout]
>
> Those are both new to me.  Both look interesting.  My own experience 
> is that the simplicity of the Gibb's sampling makes it very much more 
> attractive for implementation.  Also, since it is (nearly) trivially 
> parallelizable, it is more likely we will get a useful implementation 
> right off the bat.
>
> On Mon, May 26, 2008 at 5:49 PM, Daniel Kluesing 
> <[EMAIL PROTECTED]>
> wrote:
>
> > (Hijacking the thread to discuss ways to implement LDA)
> >
> > Had you seen
> > http://books.nips.cc/papers/files/nips20/NIPS2007_0672.pdf
> > ?
> >
> > Their hierarchical distributed LDA formulation uses gibbs sampling 
> > and
>
> > fits into mapreduce.
> >
> > http://www.cs.berkeley.edu/~jawolfe/pubs/08-icml-em.pdf<http://www.c
> > s.berkeley.edu/%7Ejawolfe/pubs/08-icml-em.pdf>
> <http://www.cs.
> > berkeley.edu/%7Ejawolfe/pubs/08-icml-em.pdf>gives a mapreduce
> formulation for the variational EM method.
> >
> > I'm still chewing on them, but my first impression is that the EM 
> > approach would give better performance on bigger data sets. Opposing

> > views welcome.
> >
> >
>



--
ted

RE: LDA [was RE: Taste on Mahout]

Reply via email to