[
https://issues.apache.org/jira/browse/SPARK-9246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646554#comment-14646554
]
Joseph K. Bradley commented on SPARK-9246:
------------------------------------------
If it's much easier or faster computationally, it's OK if it's approximate.
(It probably should be analogous to describeTopics, so I guess it will be
approximate.) I think we should make both exact at some point, with a little
more work, but either is OK for now.
> DistributedLDAModel predict top docs per topic
> ----------------------------------------------
>
> Key: SPARK-9246
> URL: https://issues.apache.org/jira/browse/SPARK-9246
> Project: Spark
> Issue Type: New Feature
> Components: MLlib
> Reporter: Joseph K. Bradley
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> For each topic, return top documents based on topicDistributions.
> Synopsis:
> {code}
> /**
> * @param maxDocuments Max docs to return for each topic
> * @return Array over topics of (sorted top docs, corresponding doc-topic
> weights)
> */
> def topDocumentsPerTopic(maxDocuments: Int): Array[(Array[Long],
> Array[Double])]
> {code}
> Note: We will need to make sure that the above return value format is
> Java-friendly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]