[ 
https://issues.apache.org/jira/browse/MAHOUT-682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031961#comment-13031961
 ] 

Jake Mannix commented on MAHOUT-682:
------------------------------------

runIterationSequential has not been finished, don't use it (that's why it's 
commented out) and never used.

If you want to work on getting sequential iteration (i.e. LDA without hadoop) 
working, that would be great (put any code on a new JIRA ticket, however).

> The LDA output does not include the topic-probability distribution per 
> document (p(z|d)). It outputs only the topics and corresponding words.
> ---------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-682
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-682
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.4
>            Reporter: Himanshu Gahlot
>            Assignee: Jake Mannix
>             Fix For: 0.6
>
>         Attachments: MAHOUT-458.patch, MAHOUT-458.patch
>
>
> The current implementation of LDA outputs only topics and their words. Many 
> applications need the p(z|d) values of a document to use this vector as a 
> reduced representation of the document (dimensionality reduction of 
> document). We need to introduce a new key which would keep track of the gamma 
> values for each document (as obtained from the document.infer() method) and 
> writes these to the output stream and finally, PrintLDATopics should output 
> these values per document id. Also, outputting the probabilities of words in 
> a topic would also provide a more meaningful output.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to