Dave Knoester created ZEPPELIN-2474: ---------------------------------------
Summary: ClassCast exception when interpreting UDFs from a String Key: ZEPPELIN-2474 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2474 Project: Zeppelin Issue Type: Bug Components: zeppelin-interpreter Environment: OS X 10.11.6, spark-2.1.0-bin-hadoop2.7, Scala version 2.11.8 (bundled w/ Spark), Java 1.8.0_121 Reporter: Dave Knoester Priority: Blocker Hi Zeppelin team, I'm cross-posting this issue: https://issues.apache.org/jira/browse/SPARK-20525 here in the hopes that someone here can help, since Zeppelin has already solved it. I'm trying to interpret a string containing Scala code from inside a Spark session. Everything is working fine, except for User Defined Function-like things (UDFs, map, flatMap, etc). For example, this code works in Zeppelin: import org.apache.spark.sql._ import org.apache.spark.sql.functions._ import spark.implicits._ val upper: String => String = _.toUpperCase val upperUDF = udf(upper) val df = spark.sparkContext.parallelize(Seq("foo","bar")).toDF.withColumn("UPPER", upperUDF($"value")) df.show() However, this code fails when run in a spark-shell: import scala.tools.nsc.GenericRunnerSettings import scala.tools.nsc.interpreter.IMain val settings = new GenericRunnerSettings( println _ ) settings.usejavacp.value = true val interpreter = new IMain(settings, new java.io.PrintWriter(System.out)) interpreter.bind("spark", spark); interpreter.interpret("import org.apache.spark.sql.functions.\nimport spark.implicits.\nval upper: String => String = _.toUpperCase\nval upperUDF = udf(upper)\nspark.sparkContext.parallelize(Seq(\"foo\",\"bar\")).toDF.withColumn(\"UPPER\", upperUDF($\"value\")).show") Exception: Caused by: java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133) at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2237) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2155) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2013) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2231) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2155) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2013) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:80) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Any help is appreciated! -- This message was sent by Atlassian JIRA (v6.3.15#6346)