[
https://issues.apache.org/jira/browse/SPARK-22288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16207706#comment-16207706
]
Ryan Williams commented on SPARK-22288:
---------------------------------------
Makes sense, fine with me to "Won't Fix".
bq. You can always use a different serializer like kryo
NB: this is closure-serialization, where [only Java has ever worked
afaik|https://issues.apache.org/jira/browse/SPARK-12414]
> Tricky interaction between closure-serialization and inheritance results in
> confusing failure
> ---------------------------------------------------------------------------------------------
>
> Key: SPARK-22288
> URL: https://issues.apache.org/jira/browse/SPARK-22288
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.2.0
> Reporter: Ryan Williams
> Priority: Minor
>
> Documenting this since I've run into it a few times; [full repro / discussion
> here|https://github.com/ryan-williams/spark-bugs/tree/serde].
> Given 3 possible super-classes:
> {code}
> class Super1(n: Int)
> class Super2(n: Int) extends Serializable
> class Super3
> {code}
> A subclass that passes a closure to an RDD operation (e.g. {{map}} or
> {{filter}}), where the closure references one of the subclass's fields, will
> throw an {{java.io.InvalidClassException: …; no valid constructor}} exception
> when the subclass extends {{Super1}} but not {{Super2}} or {{Super3}}.
> Referencing method-local variables (instead of fields) is fine in all cases:
> {code}
> class App extends Super1(4) with Serializable {
> val s = "abc"
> def run(): Unit = {
> val sc = new SparkContext(new SparkConf().set("spark.master",
> "local[4]").set("spark.app.name", "serde-test"))
> try {
> sc
> .parallelize(1 to 10)
> .filter(Main.fn(_, s)) // danger! closure references `s`, crash
> ensues
> .collect() // driver stack-trace points here
> } finally {
> sc.stop()
> }
> }
> }
> object App {
> def main(args: Array[String]): Unit = { new App().run() }
> def fn(i: Int, s: String): Boolean = i % 2 == 0
> }
> {code}
> The task-failure stack trace looks like:
> {code}
> java.io.InvalidClassException: com.MyClass; no valid constructor
> at
> java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:150)
> at
> java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:790)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1782)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
> at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
> {code}
> and a driver stack-trace will point to the first line that initiates a Spark
> job that exercises the closure/RDD-operation in question.
> Not sure how much this should be considered a problem with Spark's
> closure-serialization logic vs. Java serialization, but maybe if the former
> gets looked at or improved (e.g. with 2.12 support), this kind of interaction
> can be improved upon.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]