[ 
https://issues.apache.org/jira/browse/SPARK-9246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646554#comment-14646554
 ] 

Joseph K. Bradley commented on SPARK-9246:
------------------------------------------

If it's much easier or faster computationally, it's OK if it's approximate.  
(It probably should be analogous to describeTopics, so I guess it will be 
approximate.)  I think we should make both exact at some point, with a little 
more work, but either is OK for now.

> DistributedLDAModel predict top docs per topic
> ----------------------------------------------
>
>                 Key: SPARK-9246
>                 URL: https://issues.apache.org/jira/browse/SPARK-9246
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Joseph K. Bradley
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> For each topic, return top documents based on topicDistributions.
> Synopsis:
> {code}
> /**
>  * @param maxDocuments  Max docs to return for each topic
>  * @return Array over topics of (sorted top docs, corresponding doc-topic 
> weights)
>  */
> def topDocumentsPerTopic(maxDocuments: Int): Array[(Array[Long], 
> Array[Double])]
> {code}
> Note: We will need to make sure that the above return value format is 
> Java-friendly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to