bk-mz opened a new issue, #15080:
URL: https://github.com/apache/iceberg/issues/15080

   ## Summary
   
   The `rewrite_position_delete_files` procedure fails with a 
`ValidationException` when run on tables that have array columns containing 
primitive fields. This is a regression introduced in Iceberg 1.8.0.
   
   ## Error
   
   ```
   org.apache.iceberg.exceptions.ValidationException: Invalid partition field 
parent: list<struct<5: value: optional long, 6: count: optional int>>
        at 
org.apache.iceberg.PartitionSpec.checkCompatibility(PartitionSpec.java:674)
        at 
org.apache.iceberg.PartitionSpec.checkCompatibility(PartitionSpec.java:658)
        at org.apache.iceberg.PartitionSpec$Builder.add(PartitionSpec.java:514)
        at 
org.apache.iceberg.PartitionSpec$Builder.identity(PartitionSpec.java:542)
        at 
org.apache.iceberg.expressions.ExpressionUtil.lambda$identitySpec$5(ExpressionUtil.java:745)
        at java.base/java.lang.Iterable.forEach(Iterable.java:75)
        at 
org.apache.iceberg.expressions.ExpressionUtil.identitySpec(ExpressionUtil.java:744)
        at 
org.apache.iceberg.expressions.ExpressionUtil.extractByIdInclusive(ExpressionUtil.java:275)
        at 
org.apache.iceberg.spark.source.PositionDeletesRowReader.open(PositionDeletesRowReader.java:95)
   ```
   
   ## Root Cause
   
   Commit 9fb80b716 added validation in `PartitionSpec.checkCompatibility()` 
that partition field parents must be `StructType`. 
   
   When reading position deletes, `ExpressionUtil.nonConstantFieldIds()` 
collects ALL primitive field IDs from the table schema, including those nested 
inside arrays. Then `ExpressionUtil.identitySpec()` attempts to create identity 
partitions for these fields, which fails validation because the parent type is 
a list, not a struct.
   
   ## Reproduction
   
   Tables with array columns containing primitive fields trigger this bug:
   
   ```sql
   CREATE TABLE test_table (
     id BIGINT, 
     data STRING, 
     items ARRAY<STRUCT<value:BIGINT, count:INT>>
   ) USING iceberg 
   TBLPROPERTIES('format-version'='2', 'write.delete.mode'='merge-on-read');
   
   INSERT INTO test_table VALUES 
     (1, 'a', array(named_struct('value', cast(10 as bigint), 'count', 1))),
     (2, 'b', array(named_struct('value', cast(20 as bigint), 'count', 2)));
   
   DELETE FROM test_table WHERE id = 1;
   DELETE FROM test_table WHERE id = 2;
   
   -- This fails with ValidationException
   CALL system.rewrite_position_delete_files(table => 'test_table', options => 
map('rewrite-all','true'));
   ```
   
   ## Reproducer PR
   
   See PR #15079 for a test case that reproduces this issue.
   
   ## Environment
   
   - Iceberg version: 1.8.0+ (regression from 1.7.1)
   - Spark version: 3.5.x
   - Table format version: 2
   
   ## Workaround
   
   Use Iceberg 1.7.1 or earlier until this is fixed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to