[ 
https://issues.apache.org/jira/browse/MAHOUT-399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179670#comment-13179670
 ] 

Jake Mannix commented on MAHOUT-399:
------------------------------------

Haven't really looked at it.  I'd say that the original Mahout LDA (David 
Hall's version) has corner cases where it doesn't converge properly, even on a 
clearly defined topic-derived small corpus.  This test passes correctly for the 
new LDA impl (CVB0).  We can close this one as "fixed in one impl, won't fix in 
another" and open another JIRA ticket for 0.7 which is "remove old LDA" once we 
verify that users have tried the new one on a variety of data sets and like it 
better.  Right now we're going on the fact that I (and my coworkers) have used 
this well in-house.  Not a lot of verification to go on, but I'd even feel 
comfortable removing the old LDA in 0.7 even if we don't get a lot of test 
feedback from other people, but I'm open to discussion on that.
                
> LDA on Mahout 0.3 does not converge to correct solution for overlapping 
> pyramids toy problem.
> ---------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-399
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-399
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.3, 0.4, 0.5
>         Environment: Mac OS X 10.6.2, Hadoop 0.20.2, Mahout 0.3.
>            Reporter: Michael Lazarus
>            Assignee: Jake Mannix
>              Labels: lda, mahout
>             Fix For: 0.6
>
>         Attachments: 1000docs_26terms_5topics.jpg, MAHOUT-399.diff, 
> Overlapping Pyramids Toy Dataset.pdf, olt.tar
>
>
> Hello,
> Apologies if I have not labeled this correctly.
> I have run a toy problem on Mahout 0.3 (locally) for LDA that I used to test 
> Blei's c version of LDA that he posts on his site. It has an exact solution 
> that the LDA should converge to.  Please see attached PDF that describes the 
> intended output.
> Is LDA working?  The following output indicates some sort of collapsing 
> behavior to me.
> T0    T1      T2      T3      T4
> x     w       x       u       x
> u     u       g       j       n
> l     r       i       m       l
> j     q       h       h       p
> v     p       e       i       q
> e     t       f       g       v
> d     s       d       f       o
> b     c       b       n       k
> y     f       c       l       m
> w     v       u       v       u
> c     d       p       y       t
> k     o       l       r       r
> i     b       j       k       j
> f     e       k       e       f
> g     x       y       s       y
> t     y       w       b       w
> h     i       s       p       s
> o     l       v       x       d
> q     j       t       d       i
> n     k       o       t       b
> The intended output is (again, please see attached):
> D     I       N       S       X
> d     i       n       s       x
> c     h       m       t       y
> e     j       o       r       w
> b     k       l       u       v
> f     g       p       q       a
> a     f       k       p       b
> g     l       q       v       u
> h     m       j       w       t
> y     u       r       o       c
> n     s       d       d       i
> s     e       x       f       f
> r     q       i       i       n
> m     v       w       c       o
> o     w       u       a       h
> q     n       s       h       g
> p     t       c       x       d
> t     x       f       e       l
> x     d       e       j       s
> w     y       g       b       j
> i     r       y       n       r
> u     o       h       y       m
> k     b       t       l       e
> v     c       a       m       k
> j     a       b       g       p
> l     p       v       k       q
> What tests do you run to make sure the output is correct?
> Thank you,
> Mike.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to