[
https://issues.apache.org/jira/browse/SPARK-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kan Zhang updated SPARK-1866:
-----------------------------
Comment: was deleted
(was: Unfortunately this type of error will pop up whenever a closure
references user objects (any objects other than nested closure objects) that
are not serializable. Our current approach is we don't clone (or null) user
objects since we can't be sure it is safe to do so (see commit
f346e64637fa4f9bd95fcc966caa496bea5feca0).
Spark shell synthesizes a class for each line. In this case, the class for the
closure line imports {{instances}} as a field (since the parser thinks it is
referenced by this line) and the corresponding line object is referenced by the
closure via {{x}}.
My take on this is advising users to avoid name collisions as a workaround.)
> Closure cleaner does not null shadowed fields when outer scope is referenced
> ----------------------------------------------------------------------------
>
> Key: SPARK-1866
> URL: https://issues.apache.org/jira/browse/SPARK-1866
> Project: Spark
> Issue Type: Bug
> Affects Versions: 1.0.0
> Reporter: Aaron Davidson
> Assignee: Kan Zhang
> Priority: Critical
> Fix For: 1.0.1, 1.1.0
>
>
> Take the following example:
> {code}
> val x = 5
> val instances = new org.apache.hadoop.fs.Path("/") /* non-serializable */
> sc.parallelize(0 until 10).map { _ =>
> val instances = 3
> (instances, x)
> }.collect
> {code}
> This produces a "java.io.NotSerializableException:
> org.apache.hadoop.fs.Path", despite the fact that the outer instances is not
> actually used within the closure. If you change the name of the outer
> variable instances to something else, the code executes correctly, indicating
> that it is the fact that the two variables share a name that causes the issue.
> Additionally, if the outer scope is not used (i.e., we do not reference "x"
> in the above example), the issue does not appear.
--
This message was sent by Atlassian JIRA
(v6.2#6252)