Michele Rastelli created SPARK-45254:
----------------------------------------
Summary: Non-nullable schema is not effective in DF from JSON
Key: SPARK-45254
URL: https://issues.apache.org/jira/browse/SPARK-45254
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 3.4.1, 3.3.3
Reporter: Michele Rastelli
In Spark 3.3 and 3.4, when creating a DF with schema with non-nullable fields,
the created DF ends up having schema with nullable fields.
{code:java}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types.{BooleanType, StructField, StructType}
object Foo extends App {
val spark: SparkSession = SparkSession.builder()
.appName("foo")
.master("local[*]")
.config("spark.driver.host", "127.0.0.1")
.getOrCreate()
val schema = StructType(Array(StructField("a", BooleanType, nullable =
false)))
import spark.implicits._
val df = spark.read.schema(schema).json(Seq(
"""{"a":null}""",
"""{"a":true}""",
"""{"a":false}""",
).toDS)
df.collect()
.map(_.toString())
.foreach(println(_))
schema.printTreeString()
df.schema.printTreeString()
}
{code}
Produces:
{code:java}
[null]
[true]
[false]
root
|-- a: boolean (nullable = false)
root
|-- a: boolean (nullable = true)
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]