[
https://issues.apache.org/jira/browse/SPARK-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061501#comment-14061501
]
Kan Zhang commented on SPARK-1866:
----------------------------------
My previous comment may be less readable, let me try again:
The root cause is when the class for line {{sc.parallelize()...}} is generated,
variable {{instances}} defined in the preceding line gets imported by the
parser (since it thinks {{instances}} is referenced by this line) and becomes
part of the outer object for the closure. This outer object is referenced by
the closure through variable {{x}}. However, currently we choose not to null
(or clone) outer objects when we clean closures since we can't be sure it is
safe to do so (see commit
[f346e64|https://github.com/apache/spark/commit/f346e64637fa4f9bd95fcc966caa496bea5feca0]).
As a result, {{instances}} is not nulled by ClosureCleaner even though it is
not actually used within the closure. This type of exception will pop up
whenever a closure references outer objects that are not serializable.
> Closure cleaner does not null shadowed fields when outer scope is referenced
> ----------------------------------------------------------------------------
>
> Key: SPARK-1866
> URL: https://issues.apache.org/jira/browse/SPARK-1866
> Project: Spark
> Issue Type: Bug
> Affects Versions: 1.0.0
> Reporter: Aaron Davidson
> Assignee: Kan Zhang
> Priority: Critical
> Fix For: 1.0.1, 1.1.0
>
>
> Take the following example:
> {code}
> val x = 5
> val instances = new org.apache.hadoop.fs.Path("/") /* non-serializable */
> sc.parallelize(0 until 10).map { _ =>
> val instances = 3
> (instances, x)
> }.collect
> {code}
> This produces a "java.io.NotSerializableException:
> org.apache.hadoop.fs.Path", despite the fact that the outer instances is not
> actually used within the closure. If you change the name of the outer
> variable instances to something else, the code executes correctly, indicating
> that it is the fact that the two variables share a name that causes the issue.
> Additionally, if the outer scope is not used (i.e., we do not reference "x"
> in the above example), the issue does not appear.
--
This message was sent by Atlassian JIRA
(v6.2#6252)