[
https://issues.apache.org/jira/browse/SPARK-28651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shixiong Zhu updated SPARK-28651:
---------------------------------
Docs Text: All fields of the Structured Streaming's file source schema will
be forced to be nullable since Spark 3.0.0. This protects users from
corruptions when the specified or inferred schema is not compatible with actual
data. If you would like the original behavior, you can set the SQL conf
"spark.sql.streaming.fileSource.schema.forceNullable" to "false". This flag is
added to reduce the migration work when upgrading to Spark 3.0.0 and will be
removed in future. Please update your codes to work with the new behavior as
soon as possible.
> Streaming file source doesn't change the schema to nullable automatically
> -------------------------------------------------------------------------
>
> Key: SPARK-28651
> URL: https://issues.apache.org/jira/browse/SPARK-28651
> Project: Spark
> Issue Type: Bug
> Components: Structured Streaming
> Affects Versions: 2.4.3
> Reporter: Tomasz Magdanski
> Assignee: Shixiong Zhu
> Priority: Major
> Labels: release-notes
> Fix For: 3.0.0
>
>
> Right now, batch DataFrame always changes the schema to nullable
> automatically (See this line:
> https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L399).
> However, streaming DataFrame's schema is read in this line
> https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L259
> which doesn't change the schema to nullable automatically.
> We should make streaming DataFrame consistent with batch.
> It can cause corrupted parquet files due to the schema mismatch.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]