[jira] [Created] (IMPALA-13853) Don't adjust Iceberg field IDs for Parquet files that don't have complex types

Jira Tue, 11 Mar 2025 03:09:11 -0700

Zoltán Borók-Nagy created IMPALA-13853:
------------------------------------------


             Summary: Don't adjust Iceberg field IDs for Parquet files that 
don't have complex types
                 Key: IMPALA-13853
                 URL: https://issues.apache.org/jira/browse/IMPALA-13853
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
            Reporter: Zoltán Borók-Nagy


In migrated Iceberg tables we can have data files with missing field IDs. We 
assume that their schema corresponds to the table schema at the point when the 
table migration happened. This means during runtime we can generate the field 
ids. The logic is more complicated when there are complex types in the table 
and the table is partitioned. In such cases we need to do some adjustments 
during field ID generation, and during that we verify that the file schema 
corresponds to the table schema (during migration).

This adjustments is not needed when the table doesn't have complex types, hence 
we can be a bit more relaxed and skip schema verification. This means Impala 
would still be able to read the table if there were some trivial schema changes 
before migration.

Repro:
{noformat}
create table mig(i int) partitioned by (p int)
stored as parquet;

insert into mig partition(p) values (1, 2);

alter table mig add column j int;

alter table mig convert to iceberg;

select * from mig;
ERROR: Migrated file 
hdfs://localhost:20500/test-warehouse/mig/p=2/2449e393e29743d0-7ce9249300000000_1145218898_data.0.parq
 has unexpected schema or partitioning.{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IMPALA-13853) Don't adjust Iceberg field IDs for Parquet files that don't have complex types

Reply via email to