[jira] [Created] (MAHOUT-682) The LDA output does not include the topic-probability distribution per document (p(z|d)). It outputs only the topics and corresponding words.

Jake Mannix (JIRA) Wed, 27 Apr 2011 15:47:43 -0700

The LDA output does not include the topic-probability distribution per document 
(p(z|d)). It outputs only the topics and corresponding words.
---------------------------------------------------------------------------------------------------------------------------------------------


                 Key: MAHOUT-682
                 URL: https://issues.apache.org/jira/browse/MAHOUT-682
             Project: Mahout
          Issue Type: Improvement
          Components: Clustering
    Affects Versions: 0.4
            Reporter: Himanshu Gahlot
            Assignee: Jake Mannix
             Fix For: 0.6
         Attachments: MAHOUT-458.patch, MAHOUT-458.patch

The current implementation of LDA outputs only topics and their words. Many 
applications need the p(z|d) values of a document to use this vector as a 
reduced representation of the document (dimensionality reduction of document). 
We need to introduce a new key which would keep track of the gamma values for 
each document (as obtained from the document.infer() method) and writes these 
to the output stream and finally, PrintLDATopics should output these values per 
document id. Also, outputting the probabilities of words in a topic would also 
provide a more meaningful output.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (MAHOUT-682) The LDA output does not include the topic-probability distribution per document (p(z|d)). It outputs only the topics and corresponding words.

Reply via email to