[ https://issues.apache.org/jira/browse/SPARK-40507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anil Dasari updated SPARK-40507: -------------------------------- Description: Dataframe saveAsTable sets all columns as optional/nullable while creating the table here [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L531] (`outputColumns.toStructType.asNullable`) This makes source parquet schema and hive table schema doesn't match and is problematic when large dataframe(s) process uses hive as temporary storage to avoid the memory pressure. Hive 3.x supports non null constraints on table columns. Please add support for non null constraints on Spark sql hive table. was: Dataframe saveAsTable sets all columns as optional/nullable while creating the table here [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L531] (`outputColumns.toStructType.asNullable`) This makes source parquet schema and hive table schema doesn't match and is problematic when large dataframe(s) process uses hive as temporary storage to avoid the memory pressure. Hive 3.x supports non null constraints on table columns. Please add support non null constraints on Spark sql hive table. > Spark creates an optional columns in hive table for fields that are not null > ---------------------------------------------------------------------------- > > Key: SPARK-40507 > URL: https://issues.apache.org/jira/browse/SPARK-40507 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.3.0 > Reporter: Anil Dasari > Priority: Major > > Dataframe saveAsTable sets all columns as optional/nullable while creating > the table here > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L531] > (`outputColumns.toStructType.asNullable`) > This makes source parquet schema and hive table schema doesn't match and is > problematic when large dataframe(s) process uses hive as temporary storage to > avoid the memory pressure. > Hive 3.x supports non null constraints on table columns. Please add support > for non null constraints on Spark sql hive table. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org