haroldjimenez opened a new issue, #16217:
URL: https://github.com/apache/iceberg/issues/16217
### Apache Iceberg version
1.10.1 (latest release)
### Query engine
Spark
### Please describe the bug 🐞
Environment
Iceberg version: 1.10.1
Spark version: 3.5.8
Catalog: spark_catalog with Hive Metastore (default)
Describe the bug
After performing the correct sequence of DROP PARTITION FIELD followed by
DROP COLUMN on an identity partition field, the table enters an unrecoverable
state with two cascading failures:
Querying .partitions metadata table throws a ValidationException
Re-adding the dropped column with the same name throws Cannot create
identity partition sourced from different field in schema
CALL system.rewrite_manifests() also fails — there is no procedure-based
recovery path
The table can only be recovered by dropping and recreating it entirely.
Note: DROP PARTITION FIELD alone works correctly. The issue only occurs when
DROP COLUMN follows.
Steps to reproduce
```java
-- 1. Create table with identity partition
CREATE TABLE spark_catalog.default.test_table (
event_id BIGINT,
event_date DATE,
event_hour INT,
user_id STRING
)
USING iceberg
PARTITIONED BY (event_date);
-- 2. Insert initial data under spec 0 (event_date only)
INSERT INTO spark_catalog.default.test_table VALUES
(1, DATE '2024-03-14', 9, 'user_A'),
(2, DATE '2024-03-15', 10, 'user_B');
-- 3. Add event_hour as identity partition field (spec 1)
ALTER TABLE spark_catalog.default.test_table
ADD PARTITION FIELD event_hour;
-- 4. Insert data under spec 1 (event_date + event_hour)
INSERT INTO spark_catalog.default.test_table VALUES
(3, DATE '2024-03-16', 14, 'user_C'),
(4, DATE '2024-03-16', 20, 'user_D');
-- 5. Drop the partition field (works fine)
ALTER TABLE spark_catalog.default.test_table
DROP PARTITION FIELD event_hour;
-- 6. Drop the source column (succeeds with no error)
ALTER TABLE spark_catalog.default.test_table
DROP COLUMN event_hour;
-- 7. Query partitions metadata → CRASH #1
SELECT * FROM spark_catalog.default.test_table.partitions;
-- 8. Try to re-add the column → CRASH #2
ALTER TABLE spark_catalog.default.test_table
ADD COLUMN event_hour INT;
-- 9. Try to recover via rewrite_manifests → CRASH #3
CALL spark_catalog.system.rewrite_manifests(
table => 'spark_catalog.default.test_table'
);
```
Expected behavior
* DROP COLUMN should either be blocked with a clear error if old partition
specs still reference the column in manifest history, OR the metadata cleanup
should handle the column removal gracefully so that .partitions remains
queryable and the column can be re-added later.
Related PR
* https://github.com/apache/iceberg/pull/14261
Actual behavior
Crash #1 — querying .partitions after DROP COLUMN:
```java
org.apache.iceberg.exceptions.ValidationException: Cannot find source column
for partition field: 1001: event_hour: identity(1001)
at
org.apache.iceberg.PartitionSpec.checkCompatibility(PartitionSpec.java:661)
at org.apache.iceberg.PartitionSpec$Builder.build(PartitionSpec.java:633)
at
org.apache.iceberg.BaseMetadataTable.transformSpec(BaseMetadataTable.java:83)
at
org.apache.iceberg.PartitionsTable.lambda$filteredManifests$4(PartitionsTable.java:230)
```
Crash #2 — re-adding the column:
```java
java.lang.IllegalArgumentException: Cannot create identity partition sourced
from different field in schema: event_hour
at
org.apache.iceberg.PartitionSpec$Builder.checkAndAddPartitionName(PartitionSpec.java:413)
at
org.apache.iceberg.TableMetadata.updateSpecSchema(TableMetadata.java:759)
```
Crash #3 — rewrite_manifests also fails with the same ValidationException as
Crash #1, leaving no procedure-based recovery path.
### Willingness to contribute
- [ ] I can contribute a fix for this bug independently
- [ ] I would be willing to contribute a fix for this bug with guidance from
the Iceberg community
- [ ] I cannot contribute a fix for this bug at this time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]