[
https://issues.apache.org/jira/browse/MAHOUT-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jake Mannix updated MAHOUT-1051:
--------------------------------
Attachment: MAHOUT-1051.patch
Slight modifications: avoid try/catch block, use the "shallow-copy" form of
SparseRowMatrix.
This has the disadvantage that if numRows is really large, but most rows are
empty, this will waste memory, but has the advantage that it doesn't care what
impl of Vector is given.
Does this work for you Gokhan?
> InMemoryCollapsedVariationalBayes0 to load input vectors with docIDs
> --------------------------------------------------------------------
>
> Key: MAHOUT-1051
> URL: https://issues.apache.org/jira/browse/MAHOUT-1051
> Project: Mahout
> Issue Type: Improvement
> Components: Clustering
> Affects Versions: 0.8
> Reporter: Gokhan Capan
> Priority: Minor
> Labels: cvb, lda
> Fix For: 0.8
>
> Attachments: MAHOUT-1051.patch, MAHOUT-1051.patch
>
>
> Based upon our conversation with Jake in the user-list, I have modified the
> o.a.m.clustering.lda.cvb.InMemoryCollapsedVariationalBayes0.loadVectors so
> that it does not ignore document ids in input. To preserve backwards
> compatibility, it behaves as it did earlier if a ClassCastException is
> thrown; which occurs when ids are not integers, and/or the document vector
> (or getDelegate() if it is a NamedVector) cannot be cast to a
> RandomAccessSparseVector.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira