[ 
https://issues.apache.org/jira/browse/SPARK-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061501#comment-14061501
 ] 

Kan Zhang commented on SPARK-1866:
----------------------------------

My previous comment may be less readable, let me try again:

The root cause is when the class for line {{sc.parallelize()...}} is generated, 
variable {{instances}} defined in the preceding line gets imported by the 
parser (since it thinks {{instances}} is referenced by this line) and becomes 
part of the outer object for the closure. This outer object is referenced by 
the closure through variable {{x}}. However, currently we choose not to null 
(or clone) outer objects when we clean closures since we can't be sure it is 
safe to do so (see commit 
[f346e64|https://github.com/apache/spark/commit/f346e64637fa4f9bd95fcc966caa496bea5feca0]).
 As a result, {{instances}} is not nulled by ClosureCleaner even though it is 
not actually used within the closure. This type of exception will pop up 
whenever a closure references outer objects that are not serializable.


> Closure cleaner does not null shadowed fields when outer scope is referenced
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-1866
>                 URL: https://issues.apache.org/jira/browse/SPARK-1866
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>            Reporter: Aaron Davidson
>            Assignee: Kan Zhang
>            Priority: Critical
>             Fix For: 1.0.1, 1.1.0
>
>
> Take the following example:
> {code}
> val x = 5
> val instances = new org.apache.hadoop.fs.Path("/") /* non-serializable */
> sc.parallelize(0 until 10).map { _ =>
>   val instances = 3
>   (instances, x)
> }.collect
> {code}
> This produces a "java.io.NotSerializableException: 
> org.apache.hadoop.fs.Path", despite the fact that the outer instances is not 
> actually used within the closure. If you change the name of the outer 
> variable instances to something else, the code executes correctly, indicating 
> that it is the fact that the two variables share a name that causes the issue.
> Additionally, if the outer scope is not used (i.e., we do not reference "x" 
> in the above example), the issue does not appear.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to