[jira] [Commented] (SPARK-22288) Tricky interaction between closure-serialization and inheritance results in confusing failure

Ryan Williams (JIRA) Tue, 17 Oct 2017 07:27:14 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-22288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16207706#comment-16207706
 ]


Ryan Williams commented on SPARK-22288:
---------------------------------------

Makes sense, fine with me to "Won't Fix".

bq. You can always use a different serializer like kryo

NB: this is closure-serialization, where [only Java has ever worked 
afaik|https://issues.apache.org/jira/browse/SPARK-12414]

> Tricky interaction between closure-serialization and inheritance results in 
> confusing failure
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-22288
>                 URL: https://issues.apache.org/jira/browse/SPARK-22288
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.2.0
>            Reporter: Ryan Williams
>            Priority: Minor
>
> Documenting this since I've run into it a few times; [full repro / discussion 
> here|https://github.com/ryan-williams/spark-bugs/tree/serde].
> Given 3 possible super-classes:
> {code}
> class Super1(n: Int)
> class Super2(n: Int) extends Serializable
> class Super3
> {code}
> A subclass that passes a closure to an RDD operation (e.g. {{map}} or 
> {{filter}}), where the closure references one of the subclass's fields, will 
> throw an {{java.io.InvalidClassException: …; no valid constructor}} exception 
> when the subclass extends {{Super1}} but not {{Super2}} or {{Super3}}. 
> Referencing method-local variables (instead of fields) is fine in all cases:
> {code}
> class App extends Super1(4) with Serializable {
>   val s = "abc"
>   def run(): Unit = {
>     val sc = new SparkContext(new SparkConf().set("spark.master", 
> "local[4]").set("spark.app.name", "serde-test"))
>     try {
>       sc
>         .parallelize(1 to 10)
>         .filter(Main.fn(_, s))  // danger! closure references `s`, crash 
> ensues
>         .collect()              // driver stack-trace points here
>     } finally {
>       sc.stop()
>     }
>   }
> }
> object App {
>   def main(args: Array[String]): Unit = { new App().run() }
>   def fn(i: Int, s: String): Boolean = i % 2 == 0
> }
> {code}
> The task-failure stack trace looks like:
> {code}
> java.io.InvalidClassException: com.MyClass; no valid constructor
>       at 
> java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:150)
>       at 
> java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:790)
>       at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1782)
>       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
>       at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
>       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
>       at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
>       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
> {code}
> and a driver stack-trace will point to the first line that initiates a Spark 
> job that exercises the closure/RDD-operation in question.
> Not sure how much this should be considered a problem with Spark's 
> closure-serialization logic vs. Java serialization, but maybe if the former 
> gets looked at or improved (e.g. with 2.12 support), this kind of interaction 
> can be improved upon.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-22288) Tricky interaction between closure-serialization and inheritance results in confusing failure

Reply via email to