Jake Mannix created MAHOUT-1009:
-----------------------------------
Summary: Remove old LDA implementation from codebase
Key: MAHOUT-1009
URL: https://issues.apache.org/jira/browse/MAHOUT-1009
Project: Mahout
Issue Type: Improvement
Components: Clustering
Affects Versions: 0.7
Reporter: Jake Mannix
Priority: Minor
Fix For: 0.7
The old LDA is unmaintained and unsupported. We already (since 0.6) have a
newer, faster version in the o.a.m.clustering.lda.cvb package, which I'm
actively working on and using in production at Twitter. We should delete the
old o.a.m.clustering.lda codebase.
Normally, I'd say that we should at the same time promote
o.a.m.clustering.lda.cvb up a package-level, but that would cause some serious
merge conflicts on my GitHub branch (with updates/improvements/new features
targetted for 0.8), so we can get users on this new code by simply changing the
driver.classes.props to have "lda" point to CVB0Driver as the main().
One thing which goes away entirely, is the LDAPrintTopics class, but it's
replaced by simply doing VectorDumper with the -sort option on the model files,
which is more standard anyways.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira