Heedo Lee created SPARK-35912:
---------------------------------
Summary: [SQL] JSON read behavior is different depending on the
cache setting when nullable is false.
Key: SPARK-35912
URL: https://issues.apache.org/jira/browse/SPARK-35912
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.1.1
Reporter: Heedo Lee
Below is the reproduced code.
{code:java}
import org.apache.spark.sql.Encoders
case class TestSchema(x: Int, y: Int)
case class BaseSchema(value: TestSchema)
val schema = Encoders.product[BaseSchema].schema
val testDS = Seq("""{"value":{"x":1}}""", """{"value":{"x":2}}""").toDS
val jsonDS = spark.read.schema(schema).json(testDS)
jsonDS.show
+---------+
| value|
+---------+
|{1, null}|
|{2, null}|
+---------+
jsonDS.cache.show
+------+
| value|
+------+
|{1, 0}|
|{2, 0}|
+------+
{code}
The above result occurs when a schema is created with a nested StructType and
nullable of StructField is false.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]