Bruce Robbins created SPARK-23251: ------------------------------------- Summary: ClassNotFoundException: scala.Any when there's a missing implicit Map encoder Key: SPARK-23251 URL: https://issues.apache.org/jira/browse/SPARK-23251 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.1 Environment: mac os high sierra, centos 7 Reporter: Bruce Robbins
In branch-2.2, when you attempt to use row.getValuesMap[Any] without an implicit Map encoder, you get a nice descriptive compile-time error: {noformat} scala> df.map(row => row.getValuesMap[Any](List("stationName", "year"))).collect <console>:26: error: Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases. df.map(row => row.getValuesMap[Any](List("stationName", "year"))).collect ^ scala> implicit val mapEncoder = org.apache.spark.sql.Encoders.kryo[Map[String, Any]] mapEncoder: org.apache.spark.sql.Encoder[Map[String,Any]] = class[value[0]: binary] scala> df.map(row => row.getValuesMap[Any](List("stationName", "year"))).collect res1: Array[Map[String,Any]] = Array(Map(stationName -> 007026 99999, year -> 2014), Map(stationName -> 007026 99999, year -> 2014), Map(stationName -> 007026 99999, year -> 2014), etc....... {noformat} On the latest master and also on branch-2.3, the transformation compiles (at least on spark-shell), but throws a ClassNotFoundException: {noformat} scala> df.map(row => row.getValuesMap[Any](List("stationName", "year"))).collect java.lang.ClassNotFoundException: scala.Any at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at scala.reflect.runtime.JavaMirrors$JavaMirror.javaClass(JavaMirrors.scala:555) at scala.reflect.runtime.JavaMirrors$JavaMirror$$anonfun$classToJava$1.apply(JavaMirrors.scala:1211) at scala.reflect.runtime.JavaMirrors$JavaMirror$$anonfun$classToJava$1.apply(JavaMirrors.scala:1203) at scala.reflect.runtime.TwoWayCaches$TwoWayCache$$anonfun$toJava$1.apply(TwoWayCaches.scala:49) at scala.reflect.runtime.Gil$class.gilSynchronized(Gil.scala:19) at scala.reflect.runtime.JavaUniverse.gilSynchronized(JavaUniverse.scala:16) at scala.reflect.runtime.TwoWayCaches$TwoWayCache.toJava(TwoWayCaches.scala:44) at scala.reflect.runtime.JavaMirrors$JavaMirror.classToJava(JavaMirrors.scala:1203) at scala.reflect.runtime.JavaMirrors$JavaMirror.runtimeClass(JavaMirrors.scala:194) at scala.reflect.runtime.JavaMirrors$JavaMirror.runtimeClass(JavaMirrors.scala:54) at org.apache.spark.sql.catalyst.ScalaReflection$.getClassFromType(ScalaReflection.scala:700) at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$dataTypeFor$1.apply(ScalaReflection.scala:84) at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$dataTypeFor$1.apply(ScalaReflection.scala:65) at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56) at org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:824) at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:39) at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$dataTypeFor(ScalaReflection.scala:64) at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1.apply(ScalaReflection.scala:512) at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1.apply(ScalaReflection.scala:445) at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56) at org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:824) at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:39) at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:445) at org.apache.spark.sql.catalyst.ScalaReflection$.serializerFor(ScalaReflection.scala:434) at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:71) at org.apache.spark.sql.SQLImplicits.newMapEncoder(SQLImplicits.scala:172) ... 49 elided scala> implicit val mapEncoder = org.apache.spark.sql.Encoders.kryo[Map[String, Any]] mapEncoder: org.apache.spark.sql.Encoder[Map[String,Any]] = class[value[0]: binary] scala> df.map(row => row.getValuesMap[Any](List("stationName", "year"))).collect res1: Array[Map[String,Any]] = Array(Map(stationName -> 007026 99999, year -> 2014), Map(stationName -> 007026 99999, year -> 2014), etc....... {noformat} This message is a lot less helpful. As with with 2.2, specifying the Map encoder allows the transformation and action to execute. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org