[ https://issues.apache.org/jira/browse/SPARK-22328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-22328: ------------------------------------ Assignee: (was: Apache Spark) > ClosureCleaner misses referenced superclass fields, gives them null values > -------------------------------------------------------------------------- > > Key: SPARK-22328 > URL: https://issues.apache.org/jira/browse/SPARK-22328 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.2.0 > Reporter: Ryan Williams > > [Runnable repro > here|https://github.com/ryan-williams/spark-bugs/tree/closure]: > Superclass with some fields: > {code} > abstract class App extends Serializable { > // SparkContext stub > @transient lazy val sc = new SparkContext(new > SparkConf().setAppName("test").setMaster("local[4]").set("spark.ui.showConsoleProgress", > "false")) > // These fields get missed by the ClosureCleaner in some situations > val n1 = 111 > val s1 = "aaa" > // Simple scaffolding to exercise passing a closure to RDD.foreach in > subclasses > def rdd = sc.parallelize(1 to 1) > def run(name: String): Unit = { > print(s"$name:\t") > body() > sc.stop() > } > def body(): Unit > } > {code} > Running a simple Spark job with various instantiations of this class: > {code} > object Main { > /** [[App]]s generated this way will not correctly detect references to > [[App.n1]] in Spark closures */ > val fn = () ⇒ new App { > val n2 = 222 > val s2 = "bbb" > def body(): Unit = rdd.foreach { _ ⇒ println(s"$n1, $n2, $s1, $s2") } > } > /** Doesn't serialize closures correctly */ > val app1 = fn() > /** Works fine */ > val app2 = > new App { > val n2 = 222 > val s2 = "bbb" > def body(): Unit = rdd.foreach { _ ⇒ println(s"$n1, $n2, $s1, $s2") } > } > /** [[App]]s created this way also work fine */ > def makeApp(): App = > new App { > val n2 = 222 > val s2 = "bbb" > def body(): Unit = rdd.foreach { _ ⇒ println(s"$n1, $n2, $s1, $s2") } > } > val app3 = makeApp() // ok > val fn2 = () ⇒ makeApp() // ok > def main(args: Array[String]): Unit = { > fn().run("fn") // bad: n1 → 0, s1 → null > app1.run("app1") // bad: n1 → 0, s1 → null > app2.run("app2") // ok > app3.run("app3") // ok > fn2().run("fn2") // ok > } > } > {code} > Build + Run: > {code} > $ sbt run > … > fn: 0, 222, null, bbb > app1: 0, 222, null, bbb > app2: 111, 222, aaa, bbb > app3: 111, 222, aaa, bbb > fn2: 111, 222, aaa, bbb > {code} > The first two versions have {{0}} and {{null}}, resp., for the {{A.n1}} and > {{A.s1}} fields. > Something about this syntax causes the problem: > {code} > () => new App { … } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org