[ https://issues.apache.org/jira/browse/SPARK-28651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shixiong Zhu updated SPARK-28651: --------------------------------- Labels: release-notes (was: ) > Streaming file source doesn't change the schema to nullable automatically > ------------------------------------------------------------------------- > > Key: SPARK-28651 > URL: https://issues.apache.org/jira/browse/SPARK-28651 > Project: Spark > Issue Type: Bug > Components: Structured Streaming > Affects Versions: 2.4.3 > Reporter: Tomasz > Priority: Major > Labels: release-notes > > Right now, batch DataFrame always changes the schema to nullable > automatically (See this line: > https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L399). > However, streaming DataFrame's schema is read in this line > https://github.com/apache/spark/blob/325bc8e9c6187a96b33a033fbb0145dfca619135/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L259 > which doesn't change the schema to nullable automatically. > We should make streaming DataFrame consistent with batch. > It can cause corrupted parquet files due to the schema mismatch. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org