Michele Rastelli created SPARK-45254:
----------------------------------------

             Summary: Non-nullable schema is not effective in DF from JSON
                 Key: SPARK-45254
                 URL: https://issues.apache.org/jira/browse/SPARK-45254
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.4.1, 3.3.3
            Reporter: Michele Rastelli


In Spark 3.3 and 3.4, when creating a DF with schema with non-nullable fields, 
the created DF ends up having schema with nullable fields.

 

 
{code:java}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types.{BooleanType, StructField, StructType}
object Foo extends App {
  val spark: SparkSession = SparkSession.builder()
    .appName("foo")
    .master("local[*]")
    .config("spark.driver.host", "127.0.0.1")
    .getOrCreate()
  val schema = StructType(Array(StructField("a", BooleanType, nullable = 
false)))
  import spark.implicits._
  val df = spark.read.schema(schema).json(Seq(
    """{"a":null}""",
    """{"a":true}""",
    """{"a":false}""",
  ).toDS)
  df.collect()
    .map(_.toString())
    .foreach(println(_))
  schema.printTreeString()
  df.schema.printTreeString()
}
 
{code}
 

 

Produces:

 
{code:java}
[null]
[true]
[false]
root
 |-- a: boolean (nullable = false)
root
 |-- a: boolean (nullable = true)
{code}
 

 

 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to