[ 
https://issues.apache.org/jira/browse/MAHOUT-815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109724#comment-13109724
 ] 

Jake Mannix commented on MAHOUT-815:
------------------------------------

I would suggest holding off on this a little, or else looking at my complete 
reworking of Mahout's LDA implementation over on GitHub: 
https://github.com/jakemannix/Mahout - look on the "cvb0" branch - I've moved 
from doing a straightforward Variational Bayes (as in the original paper) to a 
"Collapsed Variational Bayes" with some approximations which speed it up by a 
factor of 10-15, and no longer require the entire model live in memory.

Refactoring on the current codebase will get squashed by these changes, I'm 
afraid.  I'll really try to clean that code up and put up a patch for review 
this week or next.

> LDA Inference Corrections, Alpha (Dirichlet) Estimation
> -------------------------------------------------------
>
>                 Key: MAHOUT-815
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-815
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Christoph Boden
>            Assignee: Sebastian Schelter
>
> Hi, I am a PhD Student at TU Berlin DIMA. I am currently working on Mahouts 
> LDA Implementation together with Sebastian Schelter. We identified a couple 
> of points that can be fixed or improved in the current version.
> We propose to fix the inference in the expectation step of EM in accordance 
> with [1], implement maximum likelihood estimation of the dirichlet 
> distribution (alpha) as presented in [1] and some refacoring.
> [1]Blei, David M.; Ng, Andrew Y.; Jordan, Michael I (January 2003). Lafferty, 
> John. ed. "Latent Dirichlet allocation". Journal of Machine Learning Research 
> 3 (4-5): pp. 993-1022. doi:10.1162/jmlr.2003.3.4-5.993 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to