Zoltán Borók-Nagy created IMPALA-13853:
------------------------------------------
Summary: Don't adjust Iceberg field IDs for Parquet files that
don't have complex types
Key: IMPALA-13853
URL: https://issues.apache.org/jira/browse/IMPALA-13853
Project: IMPALA
Issue Type: Bug
Components: Backend
Reporter: Zoltán Borók-Nagy
In migrated Iceberg tables we can have data files with missing field IDs. We
assume that their schema corresponds to the table schema at the point when the
table migration happened. This means during runtime we can generate the field
ids. The logic is more complicated when there are complex types in the table
and the table is partitioned. In such cases we need to do some adjustments
during field ID generation, and during that we verify that the file schema
corresponds to the table schema (during migration).
This adjustments is not needed when the table doesn't have complex types, hence
we can be a bit more relaxed and skip schema verification. This means Impala
would still be able to read the table if there were some trivial schema changes
before migration.
Repro:
{noformat}
create table mig(i int) partitioned by (p int)
stored as parquet;
insert into mig partition(p) values (1, 2);
alter table mig add column j int;
alter table mig convert to iceberg;
select * from mig;
ERROR: Migrated file
hdfs://localhost:20500/test-warehouse/mig/p=2/2449e393e29743d0-7ce9249300000000_1145218898_data.0.parq
has unexpected schema or partitioning.{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)