Joseph K. Bradley created SPARK-9245:
----------------------------------------
Summary: DistributedLDAModel predict top topic per doc-term
instance
Key: SPARK-9245
URL: https://issues.apache.org/jira/browse/SPARK-9245
Project: Spark
Issue Type: New Feature
Components: MLlib
Reporter: Joseph K. Bradley
For each (document, term) pair, return top topic. Note that instances of (doc,
term) pairs within a document (a.k.a. "tokens") are exchangeable, so we should
provide an estimate per document-term, rather than per token.
Synopsis for DistributedLDAModel:
{code}
/** @return RDD of (doc ID, vector of top topic index for each term) */
def topTopicAssignments: RDD[(Long, Vector)]
{code}
Note that using Vector will let us have a sparse encoding which is
Java-friendly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]