The LDA output does not include the topic-probability distribution per document
(p(z|d)). It outputs only the topics and corresponding words.
---------------------------------------------------------------------------------------------------------------------------------------------
Key: MAHOUT-458
URL: https://issues.apache.org/jira/browse/MAHOUT-458
Project: Mahout
Issue Type: Improvement
Components: Clustering
Affects Versions: 0.4
Reporter: Himanshu Gahlot
Fix For: 0.4
The current implementation of LDA outputs only topics and their words. Many
applications need the p(z|d) values of a document to use this vector as a
reduced representation of the document (dimensionality reduction of document).
We need to introduce a new key which would keep track of the gamma values for
each document (as obtained from the document.infer() method) and writes these
to the output stream and finally, PrintLDATopics should output these values per
document id. Also, outputting the probabilities of words in a topic would also
provide a more meaningful output.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.