[jira] [Work logged] (MAHOUT-682) The LDA output does not include the topic-probability distribution per document (p(z|d)). It outputs only the topics and corresponding words.

ASF GitHub Bot (Jira) Tue, 16 Dec 2025 01:20:08 -0800


     [ 
https://issues.apache.org/jira/browse/MAHOUT-682?focusedWorklogId=996404&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-996404
 ]


ASF GitHub Bot logged work on MAHOUT-682:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 16/Dec/25 09:19
            Start Date: 16/Dec/25 09:19
    Worklog Time Spent: 10m 
      Work Description: guan404ming merged PR #724:
URL: https://github.com/apache/mahout/pull/724




Issue Time Tracking
-------------------

    Worklog Id:     (was: 996404)
    Time Spent: 1h  (was: 50m)

> The LDA output does not include the topic-probability distribution per 
> document (p(z|d)). It outputs only the topics and corresponding words.
> ---------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-682
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-682
>             Project: Mahout
>          Issue Type: Improvement
>          Components: classic
>    Affects Versions: 0.4
>            Reporter: Himanshu Gahlot
>            Assignee: Jake Mannix
>            Priority: Major
>             Fix For: 0.5
>
>         Attachments: ASF.LICENSE.NOT.GRANTED--MAHOUT-458.patch, 
> ASF.LICENSE.NOT.GRANTED--MAHOUT-458.patch
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> The current implementation of LDA outputs only topics and their words. Many 
> applications need the p(z|d) values of a document to use this vector as a 
> reduced representation of the document (dimensionality reduction of 
> document). We need to introduce a new key which would keep track of the gamma 
> values for each document (as obtained from the document.infer() method) and 
> writes these to the output stream and finally, PrintLDATopics should output 
> these values per document id. Also, outputting the probabilities of words in 
> a topic would also provide a more meaningful output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (MAHOUT-682) The LDA output does not include the topic-probability distribution per document (p(z|d)). It outputs only the topics and corresponding words.

Reply via email to