[
https://issues.apache.org/jira/browse/SPARK-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14200531#comment-14200531
]
Matei Zaharia commented on SPARK-993:
-------------------------------------
Arun, you'd see this issue if you do collect() or take() and then println. The
problem is that the same Text object (for example) is referenced for all
records in the dataset. The counts will be okay.
> Don't reuse Writable objects in HadoopRDDs by default
> -----------------------------------------------------
>
> Key: SPARK-993
> URL: https://issues.apache.org/jira/browse/SPARK-993
> Project: Spark
> Issue Type: Improvement
> Reporter: Matei Zaharia
>
> Right now we reuse them as an optimization, which leads to weird results when
> you call collect() on a file with distinct items. We should instead make that
> behavior optional through a flag.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]