[
https://issues.apache.org/jira/browse/SPARK-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597900#comment-14597900
]
PJ Van Aeken commented on SPARK-8565:
-------------------------------------
No but the data comes from ElasticSearch indices which are continously filled.
Could it be that more records are added while TF-IDF is running?
> TF-IDF drops records
> --------------------
>
> Key: SPARK-8565
> URL: https://issues.apache.org/jira/browse/SPARK-8565
> Project: Spark
> Issue Type: Bug
> Components: MLlib
> Affects Versions: 1.3.1
> Reporter: PJ Van Aeken
>
> When applying TFIDF on an RDD[Seq[String]] with 1213 records, I get an
> RDD[Vector] back with only 1204 records. This prevents me from zipping it
> with the original so I can reattach the document ids.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]