[
https://issues.apache.org/jira/browse/MAHOUT-399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148329#comment-13148329
]
Jake Mannix commented on MAHOUT-399:
------------------------------------
Of course, it should be noted, that the code I'll be running it against is
different in algorithm detail than both Mike's code (collapsed gibbs sampling)
and David's original implementation here (based on variational bayes), as it's
a parallel version of an approximate collapsed variational bayes (c.f. the
algorithm labeled "cvb0" here: http://www.datalab.uci.edu/papers/uai_2009.pdf )
> LDA on Mahout 0.3 does not converge to correct solution for overlapping
> pyramids toy problem.
> ---------------------------------------------------------------------------------------------
>
> Key: MAHOUT-399
> URL: https://issues.apache.org/jira/browse/MAHOUT-399
> Project: Mahout
> Issue Type: Bug
> Components: Classification
> Affects Versions: 0.3, 0.4, 0.5
> Environment: Mac OS X 10.6.2, Hadoop 0.20.2, Mahout 0.3.
> Reporter: Michael Lazarus
> Assignee: Grant Ingersoll
> Labels: lda, mahout
> Fix For: 0.6
>
> Attachments: Overlapping Pyramids Toy Dataset.pdf, olt.tar
>
>
> Hello,
> Apologies if I have not labeled this correctly.
> I have run a toy problem on Mahout 0.3 (locally) for LDA that I used to test
> Blei's c version of LDA that he posts on his site. It has an exact solution
> that the LDA should converge to. Please see attached PDF that describes the
> intended output.
> Is LDA working? The following output indicates some sort of collapsing
> behavior to me.
> T0 T1 T2 T3 T4
> x w x u x
> u u g j n
> l r i m l
> j q h h p
> v p e i q
> e t f g v
> d s d f o
> b c b n k
> y f c l m
> w v u v u
> c d p y t
> k o l r r
> i b j k j
> f e k e f
> g x y s y
> t y w b w
> h i s p s
> o l v x d
> q j t d i
> n k o t b
> The intended output is (again, please see attached):
> D I N S X
> d i n s x
> c h m t y
> e j o r w
> b k l u v
> f g p q a
> a f k p b
> g l q v u
> h m j w t
> y u r o c
> n s d d i
> s e x f f
> r q i i n
> m v w c o
> o w u a h
> q n s h g
> p t c x d
> t x f e l
> x d e j s
> w y g b j
> i r y n r
> u o h y m
> k b t l e
> v c a m k
> j a b g p
> l p v k q
> What tests do you run to make sure the output is correct?
> Thank you,
> Mike.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira