[ https://issues.apache.org/jira/browse/MAHOUT-1009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273110#comment-13273110 ]
Sebastian Schelter commented on MAHOUT-1009: -------------------------------------------- +1 on removing the old LDA implementation. I know from folks at Yahoo that they were disappointed by its scaling behavior and a colleague of mine even suspects some errors in the computation. > Remove old LDA implementation from codebase > ------------------------------------------- > > Key: MAHOUT-1009 > URL: https://issues.apache.org/jira/browse/MAHOUT-1009 > Project: Mahout > Issue Type: Improvement > Components: Clustering > Affects Versions: 0.7 > Reporter: Jake Mannix > Assignee: Jake Mannix > Priority: Minor > Fix For: 0.7 > > > The old LDA is unmaintained and unsupported. We already (since 0.6) have a > newer, faster version in the o.a.m.clustering.lda.cvb package, which I'm > actively working on and using in production at Twitter. We should delete the > old o.a.m.clustering.lda codebase. > Normally, I'd say that we should at the same time promote > o.a.m.clustering.lda.cvb up a package-level, but that would cause some > serious merge conflicts on my GitHub branch (with updates/improvements/new > features targetted for 0.8), so we can get users on this new code by simply > changing the driver.classes.props to have "lda" point to CVB0Driver as the > main(). > One thing which goes away entirely, is the LDAPrintTopics class, but it's > replaced by simply doing VectorDumper with the -sort option on the model > files, which is more standard anyways. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira