bk-mz opened a new issue, #15080:
URL: https://github.com/apache/iceberg/issues/15080
## Summary
The `rewrite_position_delete_files` procedure fails with a
`ValidationException` when run on tables that have array columns containing
primitive fields. This is a regression introduced in Iceberg 1.8.0.
## Error
```
org.apache.iceberg.exceptions.ValidationException: Invalid partition field
parent: list<struct<5: value: optional long, 6: count: optional int>>
at
org.apache.iceberg.PartitionSpec.checkCompatibility(PartitionSpec.java:674)
at
org.apache.iceberg.PartitionSpec.checkCompatibility(PartitionSpec.java:658)
at org.apache.iceberg.PartitionSpec$Builder.add(PartitionSpec.java:514)
at
org.apache.iceberg.PartitionSpec$Builder.identity(PartitionSpec.java:542)
at
org.apache.iceberg.expressions.ExpressionUtil.lambda$identitySpec$5(ExpressionUtil.java:745)
at java.base/java.lang.Iterable.forEach(Iterable.java:75)
at
org.apache.iceberg.expressions.ExpressionUtil.identitySpec(ExpressionUtil.java:744)
at
org.apache.iceberg.expressions.ExpressionUtil.extractByIdInclusive(ExpressionUtil.java:275)
at
org.apache.iceberg.spark.source.PositionDeletesRowReader.open(PositionDeletesRowReader.java:95)
```
## Root Cause
Commit 9fb80b716 added validation in `PartitionSpec.checkCompatibility()`
that partition field parents must be `StructType`.
When reading position deletes, `ExpressionUtil.nonConstantFieldIds()`
collects ALL primitive field IDs from the table schema, including those nested
inside arrays. Then `ExpressionUtil.identitySpec()` attempts to create identity
partitions for these fields, which fails validation because the parent type is
a list, not a struct.
## Reproduction
Tables with array columns containing primitive fields trigger this bug:
```sql
CREATE TABLE test_table (
id BIGINT,
data STRING,
items ARRAY<STRUCT<value:BIGINT, count:INT>>
) USING iceberg
TBLPROPERTIES('format-version'='2', 'write.delete.mode'='merge-on-read');
INSERT INTO test_table VALUES
(1, 'a', array(named_struct('value', cast(10 as bigint), 'count', 1))),
(2, 'b', array(named_struct('value', cast(20 as bigint), 'count', 2)));
DELETE FROM test_table WHERE id = 1;
DELETE FROM test_table WHERE id = 2;
-- This fails with ValidationException
CALL system.rewrite_position_delete_files(table => 'test_table', options =>
map('rewrite-all','true'));
```
## Reproducer PR
See PR #15079 for a test case that reproduces this issue.
## Environment
- Iceberg version: 1.8.0+ (regression from 1.7.1)
- Spark version: 3.5.x
- Table format version: 2
## Workaround
Use Iceberg 1.7.1 or earlier until this is fixed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]