Gerard Alexander created SPARK-26984:
----------------------------------------
Summary: Incompatibility between Spark releases - Some(null)
Key: SPARK-26984
URL: https://issues.apache.org/jira/browse/SPARK-26984
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 2.4.0
Environment: Linux CentOS, Databricks.
Reporter: Gerard Alexander
Fix For: 2.4.1, 2.4.2
Please refer to
[https://stackoverflow.com/questions/54851205/why-does-somenull-throw-nullpointerexception-in-spark-2-4-but-worked-in-2-2/54861152#54861152.]
NB: Not sure of priority being correct - no doubt one will evaluate.
It is noted that the following:
{{val df = Seq( }}
{{ (1, Some("a"), Some(1)), }}
{{ (2, Some(null), Some(2)), }}
{{ (3, Some("c"), Some(3)), }}
{{ (4, None, None) ).toDF("c1", "c2", "c3")}}
In Spark 2.2.1 (on mapr) the Some(null) works fine, in Spark 2.4.0 on
Databricks an error ensues.
{{java.lang.RuntimeException: Error while encoding:
java.lang.NullPointerException assertnotnull(assertnotnull(input[0,
scala.Tuple3, true]))._1 AS _1#6 staticinvoke(class
org.apache.spark.unsafe.types.UTF8String, StringType, fromString,
unwrapoption(ObjectType(class java.lang.String),
assertnotnull(assertnotnull(input[0, scala.Tuple3, true]))._2), true, false) AS
_2#7 unwrapoption(IntegerType, assertnotnull(assertnotnull(input[0,
scala.Tuple3, true]))._3) AS _3#8 at
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:293)
at
org.apache.spark.sql.SparkSession.$anonfun$createDataset$1(SparkSession.scala:472)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)
at scala.collection.immutable.List.foreach(List.scala:388) at
scala.collection.TraversableLike.map(TraversableLike.scala:233) at
scala.collection.TraversableLike.map$(TraversableLike.scala:226) at
scala.collection.immutable.List.map(List.scala:294) at
org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:472) at
org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:377) at
org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:228)
... 57 elided Caused by: java.lang.NullPointerException at
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:109)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
Source) at
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:289)
... 66 more}}
You can argue it is solvable otherwise, but there may well be an existing code
base that could be affected.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]