Re: [PR] Spark: Add schema conversion support for default values [iceberg]

via GitHub Mon, 27 Oct 2025 22:54:04 -0700


amogh-jahagirdar commented on code in PR #14407:
URL: https://github.com/apache/iceberg/pull/14407#discussion_r2468026849



##########
spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/TypeToSparkType.java:
##########
@@ -69,9 +70,24 @@ public DataType struct(Types.StructType struct, 
List<DataType> fieldResults) {
       if (field.doc() != null) {
         sparkField = sparkField.withComment(field.doc());
       }
+      if (field.writeDefault() != null) {
+        // Convert Iceberg default value to Spark SQL string representation. 
Spark stores default
+        // values as SQL strings in column metadata. The .sql() method formats 
literals correctly
+        // for each type

Review Comment:
   If we just remove the comment below for the initial default case, I'd just 
have a more concise comment that just spans both, above this if block. "Convert 
both write and initial default values to Spark SQL string literal 
representations on the StructField metadata"



##########
spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/TypeToSparkType.java:
##########
@@ -69,9 +70,24 @@ public DataType struct(Types.StructType struct, 
List<DataType> fieldResults) {
       if (field.doc() != null) {
         sparkField = sparkField.withComment(field.doc());
       }
+      if (field.writeDefault() != null) {
+        // Convert Iceberg default value to Spark SQL string representation. 
Spark stores default
+        // values as SQL strings in column metadata. The .sql() method formats 
literals correctly
+        // for each type
+        Object writeDefault = SparkUtil.internalToSpark(field.type(), 
field.writeDefault());
+        sparkField =
+            
sparkField.withCurrentDefaultValue(Literal$.MODULE$.create(writeDefault, 
type).sql());
+      }
+      if (field.initialDefault() != null) {
+        // Same conversion for existence default values, used for existing 
rows when column is added
+        // to schema

Review Comment:
   I'd just remove completely or say "Same conversion logic as write defaults 
but for initial defaults". No need to specify _when_ initial defaults (or in 
spark's terms, existence) is used in the code comment.



##########
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestSparkDefaultValues.java:
##########
@@ -0,0 +1,200 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.sql;
+
+import static org.assertj.core.api.Assertions.assertThat;
+import static org.assertj.core.api.Assertions.assertThatThrownBy;
+
+import org.apache.iceberg.ParameterizedTestExtension;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.expressions.Literal;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.spark.CatalogTestBase;
+import org.apache.iceberg.types.Types;
+import org.apache.spark.sql.AnalysisException;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.TestTemplate;
+import org.junit.jupiter.api.extension.ExtendWith;
+
+/**
+ * Tests for Spark SQL Default values integration with Iceberg default values.
+ *
+ * <p>Note: These tests use {@code validationCatalog.createTable()} to create 
tables with default
+ * values because the Iceberg Spark integration does not yet support default 
value clauses in Spark
+ * DDL. See {@link #testCreateTableWithDefaultsNotYetSupported()} and {@link
+ * #testAlterTableAddColumnWithDefaultNotYetSupported()} for verification that 
DDL with defaults
+ * currently throws exceptions.
+ *
+ * <p>Partial column INSERT statements (e.g., {@code INSERT INTO table (col1) 
VALUES (val1)}) are
+ * not supported for DSV2 in Spark 4.0 See {@link 
#testPartialInsertNotYetSupportedInSpark()} for
+ * verification.
+ */
+@ExtendWith(ParameterizedTestExtension.class)
+public class TestSparkDefaultValues extends CatalogTestBase {
+
+  @AfterEach
+  public void dropTestTable() {
+    sql("DROP TABLE IF EXISTS %s", tableName);
+  }
+
+  @TestTemplate
+  public void testWriteDefaultWithExplicitDefault() {

Review Comment:
   Minor:  you may want to add a test where we do an insert some records with 
an explicit `columns` list which differs from the table schema column order and 
make sure the defaults are as expected. 
   
   i.e.
   createTable t1  (num1 int default 123, num2 int2 default 456) 
   insert into t1 (num2, num1) (DEFAULT, DEFAULT)



##########
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestSparkDefaultValues.java:
##########
@@ -0,0 +1,200 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.sql;
+
+import static org.assertj.core.api.Assertions.assertThat;
+import static org.assertj.core.api.Assertions.assertThatThrownBy;
+
+import org.apache.iceberg.ParameterizedTestExtension;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.expressions.Literal;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.spark.CatalogTestBase;
+import org.apache.iceberg.types.Types;
+import org.apache.spark.sql.AnalysisException;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.TestTemplate;
+import org.junit.jupiter.api.extension.ExtendWith;
+
+/**
+ * Tests for Spark SQL Default values integration with Iceberg default values.
+ *
+ * <p>Note: These tests use {@code validationCatalog.createTable()} to create 
tables with default
+ * values because the Iceberg Spark integration does not yet support default 
value clauses in Spark
+ * DDL. See {@link #testCreateTableWithDefaultsNotYetSupported()} and {@link
+ * #testAlterTableAddColumnWithDefaultNotYetSupported()} for verification that 
DDL with defaults
+ * currently throws exceptions.
+ *
+ * <p>Partial column INSERT statements (e.g., {@code INSERT INTO table (col1) 
VALUES (val1)}) are
+ * not supported for DSV2 in Spark 4.0 See {@link 
#testPartialInsertNotYetSupportedInSpark()} for
+ * verification.
+ */
+@ExtendWith(ParameterizedTestExtension.class)
+public class TestSparkDefaultValues extends CatalogTestBase {
+
+  @AfterEach
+  public void dropTestTable() {
+    sql("DROP TABLE IF EXISTS %s", tableName);
+  }
+
+  @TestTemplate
+  public void testWriteDefaultWithExplicitDefault() {

Review Comment:
   Nit: I look at this and the test below as effectively the same. I'd keep the 
one below



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Spark: Add schema conversion support for default values [iceberg]

Reply via email to