[jira] [Commented] (SPARK-993) Don't reuse Writable objects in HadoopRDDs by default

Matei Zaharia (JIRA) Thu, 06 Nov 2014 09:41:01 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14200531#comment-14200531
 ]


Matei Zaharia commented on SPARK-993:
-------------------------------------

Arun, you'd see this issue if you do collect() or take() and then println. The 
problem is that the same Text object (for example) is referenced for all 
records in the dataset. The counts will be okay.

> Don't reuse Writable objects in HadoopRDDs by default
> -----------------------------------------------------
>
>                 Key: SPARK-993
>                 URL: https://issues.apache.org/jira/browse/SPARK-993
>             Project: Spark
>          Issue Type: Improvement
>            Reporter: Matei Zaharia
>
> Right now we reuse them as an optimization, which leads to weird results when 
> you call collect() on a file with distinct items. We should instead make that 
> behavior optional through a flag.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-993) Don't reuse Writable objects in HadoopRDDs by default

Reply via email to