[ 
https://issues.apache.org/jira/browse/MAHOUT-1009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273110#comment-13273110
 ] 

Sebastian Schelter commented on MAHOUT-1009:
--------------------------------------------

+1 on removing the old LDA implementation. 

I know from folks at Yahoo that they were disappointed by its scaling behavior 
and a colleague of mine even suspects some errors in the computation.
                
> Remove old LDA implementation from codebase
> -------------------------------------------
>
>                 Key: MAHOUT-1009
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1009
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.7
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Minor
>             Fix For: 0.7
>
>
> The old LDA is unmaintained and unsupported.  We already (since 0.6) have a 
> newer, faster version in the o.a.m.clustering.lda.cvb package, which I'm 
> actively working on and using in production at Twitter.  We should delete the 
> old o.a.m.clustering.lda codebase.
> Normally, I'd say that we should at the same time promote 
> o.a.m.clustering.lda.cvb up a package-level, but that would cause some 
> serious merge conflicts on my GitHub branch (with updates/improvements/new 
> features targetted for 0.8), so we can get users on this new code by simply 
> changing the driver.classes.props to have "lda" point to CVB0Driver as the 
> main().
> One thing which goes away entirely, is the LDAPrintTopics class, but it's 
> replaced by simply doing VectorDumper with the -sort option on the model 
> files, which is more standard anyways.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to