[jira] [Commented] (SPARK-8565) TF-IDF drops records

PJ Van Aeken (JIRA) Wed, 24 Jun 2015 00:58:00 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599035#comment-14599035
 ]


PJ Van Aeken commented on SPARK-8565:
-------------------------------------

Ok, caching the source RDD works. But wouldn't the tf.cache() as described in 
the documentation of TF-IDF already materialize the ES source? 

> TF-IDF drops records
> --------------------
>
>                 Key: SPARK-8565
>                 URL: https://issues.apache.org/jira/browse/SPARK-8565
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 1.3.1
>            Reporter: PJ Van Aeken
>
> When applying TFIDF on an RDD[Seq[String]] with 1213 records, I get an 
> RDD[Vector] back with only 1204 records. This prevents me from zipping it 
> with the original so I can reattach the document ids.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-8565) TF-IDF drops records

Reply via email to