(spark) branch branch-4.0 updated: [SPARK-54028][SQL] Use empty schema when altering a view which is not Hive compatible

wenchen Fri, 31 Oct 2025 12:07:51 -0700

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-4.0
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-4.0 by this push:
     new 1c03f72534cc [SPARK-54028][SQL] Use empty schema when altering a view 
which is not Hive compatible
1c03f72534cc is described below

commit 1c03f72534cc93259dc7e760a1c01fe63211c161
Author: Chiran Ravani <[email protected]>
AuthorDate: Sat Nov 1 03:06:32 2025 +0800

    [SPARK-54028][SQL] Use empty schema when altering a view which is not Hive 
compatible
    
    ### What changes were proposed in this pull request?
    Spark attempts to save views in a Hive-compatible format and only sets the 
schema to empty if the save operation fails.
    
    However, due to certain Hive compatibility issues, the save operation may 
succeed while subsequent read operations fail. This issue arises after the 
change introduced in 
[SPARK-46934](https://issues.apache.org/jira/browse/SPARK-46934), which removed 
the verifyColumnDataType check during the ALTER TABLE operation.
    
    ### Why are the changes needed?
    to not save malformed views that no one can read.
    
    ### Does this PR introduce _any_ user-facing change?
    Yes, the malformed view will be saved in non hive compatible way so that 
Spark can read it.
    
    ### How was this patch tested?
    Updated Test case
    
    ### Was this patch authored or co-authored using generative AI tooling?
    No
    
    Closes #52730 from cravani/SPARK-54028.
    
    Authored-by: Chiran Ravani <[email protected]>
    Signed-off-by: Wenchen Fan <[email protected]>
    (cherry picked from commit 3d292dc7b1c5b5ff977c178a88f8ee73deaee88f)
    Signed-off-by: Wenchen Fan <[email protected]>
---
 .../spark/sql/hive/HiveExternalCatalog.scala       | 12 ++++++-
 .../spark/sql/hive/HiveMetastoreCatalogSuite.scala | 37 ++++++++++++++++++++++
 2 files changed, 48 insertions(+), 1 deletion(-)

diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
index 8cc7a773821d..e7b169c3ec69 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
@@ -587,7 +587,17 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, 
hadoopConf: Configurat
 
     if (tableDefinition.tableType == VIEW) {
       val newTableProps = tableDefinition.properties ++ 
tableMetaToTableProps(tableDefinition).toMap
-      val newTable = tableDefinition.copy(properties = newTableProps)
+      val schemaWithNoCollation = removeCollation(tableDefinition.schema)
+      val hiveCompatibleSchema =
+        // Spark-created views do not have to be Hive compatible. If the data 
type is not
+        // Hive compatible, we can set schema to empty so that Spark can still 
read this
+        // view as the schema is also encoded in the table properties.
+        if (schemaWithNoCollation.exists(f => 
!isHiveCompatibleDataType(f.dataType))) {
+          EMPTY_DATA_SCHEMA
+        } else {
+          schemaWithNoCollation
+        }
+      val newTable = tableDefinition.copy(schema = hiveCompatibleSchema, 
properties = newTableProps)
       try {
         client.alterTable(newTable)
       } catch {
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala
index fad374827418..a7d43ebbef07 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala
@@ -438,6 +438,43 @@ class DataSourceWithHiveMetastoreCatalogSuite
     }
   }
 
+  test("SPARK-54028: Table and View with complex nested schema and ALTER 
operations") {
+    withTable("t") {
+      val schema =
+          "struct_field STRUCT<" +
+          "`colon:field_name`:STRING" +
+          ">"
+      sql("CREATE TABLE t (" + schema + ")")
+
+      // Verify initial table schema
+      assert(spark.table("t").schema === 
CatalystSqlParser.parseTableSchema(schema))
+
+      withView("v") {
+        sql("CREATE VIEW v AS SELECT `struct_field` FROM t")
+
+        // Verify view schema matches the original schema
+        val expectedViewSchema = CatalystSqlParser.parseTableSchema(schema)
+        assert(spark.table("v").schema === expectedViewSchema)
+
+        // Add new column to table
+        sql("ALTER TABLE t ADD COLUMN (field_1 INT)")
+
+        // Update schema string to include new column
+        val updatedSchema = schema + ",field_1 INT"
+
+        // Verify table schema after ALTER
+        assert(spark.table("t").schema === 
CatalystSqlParser.parseTableSchema(updatedSchema))
+
+        // Alter view to include new column
+        sql("ALTER VIEW v AS " +
+          "SELECT `struct_field`,`field_1` FROM t")
+
+        // Verify view schema after ALTER
+        assert(spark.table("v").schema === 
CatalystSqlParser.parseTableSchema(updatedSchema))
+      }
+    }
+  }
+
   test("SPARK-46934: Handle special characters in struct types with CTAS") {
     withTable("t") {
       val schema = "`a.b` struct<`a.b.b`:array<string>, `a b c`:map<int, 
string>>"


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch branch-4.0 updated: [SPARK-54028][SQL] Use empty schema when altering a view which is not Hive compatible

Reply via email to