[ 
https://issues.apache.org/jira/browse/SPARK-28651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-28651:
---------------------------------
    Docs Text: All fields of the Structured Streaming's file source schema will 
be forced to be nullable since Spark 3.0.0. This protects users from 
corruptions when the specified or inferred schema is not compatible with actual 
data. If you would like the original behavior, you can set the SQL conf 
"spark.sql.streaming.fileSource.schema.forceNullable" to "false". This flag is 
added to reduce the migration work when upgrading to Spark 3.0.0 and will be 
removed in future. Please update your codes to work with the new behavior as 
soon as possible.

> Streaming file source doesn't change the schema to nullable automatically
> -------------------------------------------------------------------------
>
>                 Key: SPARK-28651
>                 URL: https://issues.apache.org/jira/browse/SPARK-28651
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.4.3
>            Reporter: Tomasz Magdanski
>            Assignee: Shixiong Zhu
>            Priority: Major
>              Labels: release-notes
>             Fix For: 3.0.0
>
>
> Right now, batch DataFrame always changes the schema to nullable 
> automatically (See this line: 
> https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L399).
> However, streaming DataFrame's schema is read in this line 
> https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L259
>  which doesn't change the schema to nullable automatically.
> We should make streaming DataFrame consistent with batch.
> It can cause corrupted parquet files due to the schema mismatch.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to