[I] Spark: Z-order rewrite fails with misleading error when table has a column named ICEZVALUE [iceberg]

via GitHub Fri, 20 Mar 2026 15:28:27 -0700


YanivZalach opened a new issue, #15708:
URL: https://github.com/apache/iceberg/issues/15708


   ## Problem
   When a table schema contains a column named `ICEZVALUE`, running a Z-order
   rewrite produces a misleading error. When the rewrite is skipped (e.g.
   `min-input-files` not met), no error is thrown, making the bug inconsistent
   and hard to diagnose.
   
   Error seen when rewrite runs:
   ```
   Cannot write incompatible data for the table (The table name): Cannot find 
data for the output column `ICEZVALUE`
   ```
   This gives no indication that `ICEZVALUE` is a reserved internal name.
   
   ## Root Cause
   `SparkZOrderFileRewriteRunner` add a column named `ICEZVALUE` to store
   interleaved Z-order bytes. If a user column with the same name already 
exists,
   it is silently overwritten with binary data, and the write-back then fails.
   
   ## Steps to Reproduce
   
   **Environment:**
   - Iceberg: `1.4.2`
   - Spark: `3.5.1`
   
   ```python
   from datetime import datetime
   from pyspark.sql import Row, SparkSession
   
   spark = # Open spark session
   
   spark.sql("DROP TABLE IF EXISTS spark_catalog.default.check_table")
   spark.sql("""
       CREATE TABLE spark_catalog.default.check_table (
           time_col timestamp,
           col_a bigint,
           ICEZVALUE string
       )
       USING iceberg
       PARTITIONED BY (days(time_col))
       TBLPROPERTIES ('format-version' = '2')
   """)
   
   data = [
       Row(time_col=datetime(2024, 1, 1), col_a=1, ICEZVALUE="a"),
       Row(time_col=datetime(2024, 1, 2), col_a=2, ICEZVALUE="b"),
       Row(time_col=datetime(2024, 1, 3), col_a=3, ICEZVALUE="c"),
   ]
   spark.createDataFrame(data).coalesce(1).writeTo(
       "spark_catalog.default.check_table"
   ).append()
   
   # Pass 1: skipped - no error
   spark.sql("""
       CALL spark_catalog.system.rewrite_data_files(
           table => 'spark_catalog.default.check_table',
           strategy => 'sort',
           sort_order => 'zorder(col_a)',
           options => map('min-input-files', '2')
       )
   """)
   
   # Pass 2: runs - triggers the bug
   spark.sql("""
       CALL spark_catalog.system.rewrite_data_files(
           table => 'spark_catalog.default.check_table',
           strategy => 'sort',
           sort_order => 'zorder(col_a)',
           options => map('rewrite-all', 'true')
       )
   """)
   ```
   
   ## Actual Behavior
   - Pass 1 (rewrite skipped): no error, silent.
   - Pass 2 (rewrite runs): misleading `CANNOT_FIND_DATA` AnalysisException
   
   ## Expected Behavior
   A clear `IllegalArgumentException` thrown early, explaining that `ICEZVALUE`
   is a reserved internal column name used by Iceberg Z-order rewrite.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Spark: Z-order rewrite fails with misleading error when table has a column named ICEZVALUE [iceberg]

Reply via email to