Re: LDA [was RE: Taste on Mahout]

Ted Dunning Thu, 05 Jun 2008 09:30:27 -0700

The buntine and jakulin paper is also useful reading.  I would avoid fancy
stuff like the powell rao-ization to start.


http://citeseer.ist.psu.edu/750239.html

The gibb's sampling approach is, at its heart, very simple in that most of
the math devolves into sampling discrete hidden variables from simple
distributions and then counting the results as if they were observed.

On Thu, Jun 5, 2008 at 5:49 AM, Goel, Ankur <[EMAIL PROTECTED]> wrote:

> It draws reference from Java implementation -
> http://www.arbylon.net/projects/LdaGibbsSampler.java
> which is a single class version of LDA using gibbs sampling with
> slightly better code documentation.
> I am trying to understand the code while reading the paper you suggested
> -
> "Distributed Inference for Latent Drichlet Allocation".
>
> -----Original Message-----
> From: Daniel Kluesing [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, June 04, 2008 8:31 PM
> To: [email protected]
> Subject: RE: LDA [was RE: Taste on Mahout]
>
> Ted may have a better one, but in my quick poking around at things
> http://gibbslda.sourceforge.net/ looks to be a good implementation of
> the Gibbs sampling approach.
>
> -----Original Message-----
> From: Goel, Ankur [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, June 04, 2008 4:58 AM
> To: [email protected]
> Subject: RE: LDA [was RE: Taste on Mahout]
>
> Ted, Do you have a sequential version of LDA implementation that can be
> used for reference ?
> If yes, can you please post it on Jira ? Should we open a new Jira or
> use MAHOUT-30 for this ?
>
> -----Original Message-----
> From: Ted Dunning [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, May 27, 2008 11:50 AM
> To: [email protected]
> Subject: Re: LDA [was RE: Taste on Mahout]
>
> Chris Bishop's book has a very clear exposition of the relationship
> between the variational techniques and EM.  Very good reading.
>
> On Mon, May 26, 2008 at 10:13 PM, Goel, Ankur <[EMAIL PROTECTED]>
> wrote:
>
> > Daniel/Ted,
> >      Thanks for the interesting pointers to more information on LDA
> > and EM.
> > I am going through the docs to visualize and understand how LDA
> > approach would work for my specific case.
> >
> > Once I have some idea, I can volunteer to work on the Map-Reduce side
> > of
> >
> > thngs as this is something that will benefit both my project and the
> > community.
> >
> > Looking forward to share more ideas/information on this :-)
> >
> > Regards
> > -Ankur
> >
> > -----Original Message-----
> > From: Ted Dunning [mailto:[EMAIL PROTECTED]
> > Sent: Tuesday, May 27, 2008 6:59 AM
> > To: [email protected]
> > Subject: Re: LDA [was RE: Taste on Mahout]
> >
> > Those are both new to me.  Both look interesting.  My own experience
> > is that the simplicity of the Gibb's sampling makes it very much more
> > attractive for implementation.  Also, since it is (nearly) trivially
> > parallelizable, it is more likely we will get a useful implementation
> > right off the bat.
> >
> > On Mon, May 26, 2008 at 5:49 PM, Daniel Kluesing
> > <[EMAIL PROTECTED]>
> > wrote:
> >
> > > (Hijacking the thread to discuss ways to implement LDA)
> > >
> > > Had you seen
> > > http://books.nips.cc/papers/files/nips20/NIPS2007_0672.pdf
> > > ?
> > >
> > > Their hierarchical distributed LDA formulation uses gibbs sampling
> > > and
> >
> > > fits into mapreduce.
> > >
> > > http://www.cs.berkeley.edu/~jawolfe/pubs/08-icml-em.pdf<http://www.cs.berkeley.edu/%7Ejawolfe/pubs/08-icml-em.pdf>
> <http://www.c
> > > s.berkeley.edu/%7Ejawolfe/pubs/08-icml-em.pdf>
> > <http://www.cs.
> > > berkeley.edu/%7Ejawolfe/pubs/08-icml-em.pdf>gives a mapreduce
> > formulation for the variational EM method.
> > >
> > > I'm still chewing on them, but my first impression is that the EM
> > > approach would give better performance on bigger data sets. Opposing
>
> > > views welcome.
> > >
> > >
> >
>
>
>
> --
> ted
>



-- 
ted

Re: LDA [was RE: Taste on Mahout]

Reply via email to