[jira] [Work logged] (MAHOUT-682) The LDA output does not include the topic-probability distribution per document (p(z|d)). It outputs only the topics and corresponding words.

ASF GitHub Bot (Jira) Fri, 12 Dec 2025 02:51:05 -0800


     [ 
https://issues.apache.org/jira/browse/MAHOUT-682?focusedWorklogId=995944&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-995944
 ]


ASF GitHub Bot logged work on MAHOUT-682:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 12/Dec/25 10:50
            Start Date: 12/Dec/25 10:50
    Worklog Time Spent: 10m 
      Work Description: shiavm006 opened a new pull request, #724:
URL: https://github.com/apache/mahout/pull/724

   ### Purpose of PR
   document missing QuMat APIs: `apply_cswap_gate`, `apply_t_gate`, 
`get_final_state_vector`, `apply_u_gate`, `swap_test`, `measure_overlap`; align 
API docs with currently implemented methods in `qumat.QuMat`
   
   ### Related Issues or PRs
   Closes #682
   
   ### Changes Made
   <!

Issue Time Tracking
-------------------

            Worklog Id:     (was: 995944)
    Remaining Estimate: 0h
            Time Spent: 10m

> The LDA output does not include the topic-probability distribution per 
> document (p(z|d)). It outputs only the topics and corresponding words.
> ---------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-682
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-682
>             Project: Mahout
>          Issue Type: Improvement
>          Components: classic
>    Affects Versions: 0.4
>            Reporter: Himanshu Gahlot
>            Assignee: Jake Mannix
>            Priority: Major
>             Fix For: 0.5
>
>         Attachments: ASF.LICENSE.NOT.GRANTED--MAHOUT-458.patch, 
> ASF.LICENSE.NOT.GRANTED--MAHOUT-458.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The current implementation of LDA outputs only topics and their words. Many 
> applications need the p(z|d) values of a document to use this vector as a 
> reduced representation of the document (dimensionality reduction of 
> document). We need to introduce a new key which would keep track of the gamma 
> values for each document (as obtained from the document.infer() method) and 
> writes these to the output stream and finally, PrintLDATopics should output 
> these values per document id. Also, outputting the probabilities of words in 
> a topic would also provide a more meaningful output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (MAHOUT-682) The LDA output does not include the topic-probability distribution per document (p(z|d)). It outputs only the topics and corresponding words.

Reply via email to