[jira] [Commented] (IMPALA-13364) Schema resolution doesn't work for migrated partitioned Iceberg tables that have complex types

ASF subversion and git services (Jira) Wed, 12 Mar 2025 12:35:44 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-13364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17934638#comment-17934638
 ]


ASF subversion and git services commented on IMPALA-13364:
----------------------------------------------------------

Commit a49ff618f1137808c75cf81cccdadb980e89c34d in impala's branch 
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=a49ff618f ]

IMPALA-13853: Don't adjust Iceberg field IDs for data files that don't have 
complex types

In migrated Iceberg tables we can have data files with missing field
IDs. We assume that their schema corresponds to the table schema at the
point when the table migration happened. This means during runtime we
can generate the field ids. The logic is more complicated when there are
complex types in the table and the table is partitioned. In such cases
we need to do some adjustments during field ID generation, in which case
we verify that the file schema corresponds to the table schema.

These adjustments are not needed when the table doesn't have complex
types, hence we can be a bit more relaxed and skip schema verification,
because field ID generation for top-level columns are not affected.
This means Impala would still be able to read the table if there were
trivial schema changes before migration.

With this change we allow all data files that have a compatible schema
with the table schema, which was the case before IMPALA-13364. This
behavior is also aligned with Hive.

Testing:
 * e2e tests added for both Parquet and ORC files

Change-Id: Ib1f1d0cf36792d0400de346c83e999fa50c0fa67
Reviewed-on: http://gerrit.cloudera.org:8080/22610
Reviewed-by: Daniel Becker <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Schema resolution doesn't work for migrated partitioned Iceberg tables that 
> have complex types
> ----------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-13364
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13364
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>              Labels: impala-iceberg
>             Fix For: Impala 4.5.0
>
>
> Schema resolution doesn't work correctly for migrated partitioned Iceberg 
> tables that have complex types.
> When we face a Parquet/ORC file in an Iceberg table that doesn't have field 
> IDs in the file metadata, we assume that it is an old data file before 
> migration, and the schema is the very first one, hence we can mimic Iceberg's 
> field ID generation to assign field IDs to the file schema elements.
> This process didn't take the partition columns into account. This only 
> matters when there are complex types in the table, as partition columns are 
> always the last columns in legacy Hive tables, and field IDs are assigned via 
> a "BFS-like" traversal. I.e. if there are only primitive types in the table 
> we don't have any problems, but the children of complex types columns are 
> assigned incorrectly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-13364) Schema resolution doesn't work for migrated partitioned Iceberg tables that have complex types

Reply via email to