[ https://issues.apache.org/jira/browse/SPARK-45896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17785234#comment-17785234 ]
Bruce Robbins commented on SPARK-45896: --------------------------------------- I think I have a handle on this and will make a PR shortly. > Expression encoding fails for Seq/Map of Option[Seq/Date/Timestamp/BigDecimal] > ------------------------------------------------------------------------------ > > Key: SPARK-45896 > URL: https://issues.apache.org/jira/browse/SPARK-45896 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.4.1, 3.5.0 > Reporter: Bruce Robbins > Priority: Major > > The following action fails on 3.4.1, 3.5.0, and master: > {noformat} > scala> val df = Seq(Seq(Some(Seq(0)))).toDF("a") > val df = Seq(Seq(Some(Seq(0)))).toDF("a") > org.apache.spark.SparkRuntimeException: [EXPRESSION_ENCODING_FAILED] Failed > to encode a value of the expressions: mapobjects(lambdavariable(MapObject, > ObjectType(class java.lang.Object), true, -1), > mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), > true, -2), assertnotnull(validateexternaltype(lambdavariable(MapObject, > ObjectType(class java.lang.Object), true, -2), IntegerType, IntegerType)), > unwrapoption(ObjectType(interface scala.collection.immutable.Seq), > validateexternaltype(lambdavariable(MapObject, ObjectType(class > java.lang.Object), true, -1), ArrayType(IntegerType,false), ObjectType(class > scala.Option))), None), input[0, scala.collection.immutable.Seq, true], None) > AS value#0 to a row. SQLSTATE: 42846 > ... > Caused by: java.lang.RuntimeException: scala.Some is not a valid external > type for schema of array<int> > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_0$(Unknown > Source) > ... > {noformat} > However, it succeeds on 3.3.3: > {noformat} > scala> val df = Seq(Seq(Some(Seq(0)))).toDF("a") > df: org.apache.spark.sql.DataFrame = [a: array<array<int>>] > scala> df.collect > res0: Array[org.apache.spark.sql.Row] = Array([WrappedArray(WrappedArray(0))]) > {noformat} > Map of Option[Seq] also fails on 3.4.1, 3.5.0, and master: > {noformat} > scala> val df = Seq(Map(0 -> Some(Seq(0)))).toDF("a") > val df = Seq(Map(0 -> Some(Seq(0)))).toDF("a") > org.apache.spark.SparkRuntimeException: [EXPRESSION_ENCODING_FAILED] Failed > to encode a value of the expressions: > externalmaptocatalyst(lambdavariable(ExternalMapToCatalyst_key, > ObjectType(class java.lang.Object), false, -1), > assertnotnull(validateexternaltype(lambdavariable(ExternalMapToCatalyst_key, > ObjectType(class java.lang.Object), false, -1), IntegerType, IntegerType)), > lambdavariable(ExternalMapToCatalyst_value, ObjectType(class > java.lang.Object), true, -2), mapobjects(lambdavariable(MapObject, > ObjectType(class java.lang.Object), true, -3), > assertnotnull(validateexternaltype(lambdavariable(MapObject, ObjectType(class > java.lang.Object), true, -3), IntegerType, IntegerType)), > unwrapoption(ObjectType(interface scala.collection.immutable.Seq), > validateexternaltype(lambdavariable(ExternalMapToCatalyst_value, > ObjectType(class java.lang.Object), true, -2), ArrayType(IntegerType,false), > ObjectType(class scala.Option))), None), input[0, > scala.collection.immutable.Map, true]) AS value#0 to a row. SQLSTATE: 42846 > ... > Caused by: java.lang.RuntimeException: scala.Some is not a valid external > type for schema of array<int> > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_0$(Unknown > Source) > ... > {noformat} > As with the first example, this succeeds on 3.3.3: > {noformat} > scala> val df = Seq(Map(0 -> Some(Seq(0)))).toDF("a") > df: org.apache.spark.sql.DataFrame = [a: map<int,array<int>>] > scala> df.collect > res0: Array[org.apache.spark.sql.Row] = Array([Map(0 -> WrappedArray(0))]) > {noformat} > Other cases the fail on 3.4.1, 3.5.0, and master but work fine on 3.3.3: > - {{Seq[Option[Timestamp]]}} > - {{Map[Option[Timestamp]]}} > - {{Seq[Option[Date]]}} > - {{Map[Option[Date]]}} > - {{Seq[Option[BigDecimal]]}} > - {{Map[Option[BigDecimal]]}} > However, the following work fine on 3.3.3, 3.4.1, 3.5.0, and master: > - {{Seq[Option[Map]]}} > - {{Map[Option[Map]]}} > - {{Seq[Option[<primitive-type>]]}} > - {{Map[Option[<primitive-type>]]}} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org