[ 
https://issues.apache.org/jira/browse/IMPALA-12605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-12605:
---------------------------------------
    Description: 
Impala's ALTER TABLE SET PARTITION SPEC reuses field ids of old partition specs.

This can result in having collisions of partition fields.

Repro:

{noformat}
CREATE TABLE ice_t (i int, p int) PARTITIONED BY SPEC (TRUNCATE(10, p)) STORED 
BY ICEBERG;

ALTER TABLE ice_t SET PARTITION SPEC (TRUNCATE(100, p));
{noformat}

The latter ALTER TABLE statement will create another partition spec for the 
table, but the partition field will have the same field id as the old partition 
spec's field.

Workaround for this is to use the VOID transform:
{noformat}
ALTER TABLE ice_t SET PARTITION SPEC (VOID(p), TRUNCATE(100, p));
{noformat}

But Impala should automatically assign new partition field ids in the new spec. 
This is especially true for Iceberg V2 tables, where last-partition-id is a 
required field in the metadata. The Iceberg library should handle partition 
evolution correctly, seems like we are using the wrong APIs for partition 
evolution.
For reference, Hive has the same ALTER TABLE SET PARTITION SPEC syntax, but it 
is able to correctly create the new partition spec.

  was:
Impala's ALTER TABLE SET PARTITION SPEC reuses field ids of old partition specs.

This can result in having collisions of partition fields.

Repro:

{noformat}
CREATE TABLE ice_t (i int, p int) PARTITIONED BY SPEC (TRUNCATE(10, p)) STORED 
BY ICEBERG;

ALTER TABLE ice_t SET PARTITION SPEC (TRUNCATE(100, p));
{noformat}

The latter ALTER TABLE statement will create another partition spec for the 
table, but the partition field will have the same field id as the old partition 
spec's field id.

Workaround for this is to use the VOID transform:
{noformat}
ALTER TABLE ice_t SET PARTITION SPEC (VOID(p), TRUNCATE(100, p));
{noformat}

But Impala should automatically assign new partition field ids in the new spec. 
This is especially true for Iceberg V2 tables, where last-partition-id is a 
required field in the metadata. The Iceberg library should handle partition 
evolution correctly, seems like we are using the wrong APIs for partition 
evolution.
For reference, Hive has the same ALTER TABLE SET PARTITION SPEC syntax, but it 
is able to correctly create the new partition spec.


> ALTER TABLE SET PARTITION SPEC reuses field ids of old partition specs
> ----------------------------------------------------------------------
>
>                 Key: IMPALA-12605
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12605
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Zoltán Borók-Nagy
>            Priority: Major
>              Labels: impala-iceberg
>
> Impala's ALTER TABLE SET PARTITION SPEC reuses field ids of old partition 
> specs.
> This can result in having collisions of partition fields.
> Repro:
> {noformat}
> CREATE TABLE ice_t (i int, p int) PARTITIONED BY SPEC (TRUNCATE(10, p)) 
> STORED BY ICEBERG;
> ALTER TABLE ice_t SET PARTITION SPEC (TRUNCATE(100, p));
> {noformat}
> The latter ALTER TABLE statement will create another partition spec for the 
> table, but the partition field will have the same field id as the old 
> partition spec's field.
> Workaround for this is to use the VOID transform:
> {noformat}
> ALTER TABLE ice_t SET PARTITION SPEC (VOID(p), TRUNCATE(100, p));
> {noformat}
> But Impala should automatically assign new partition field ids in the new 
> spec. This is especially true for Iceberg V2 tables, where last-partition-id 
> is a required field in the metadata. The Iceberg library should handle 
> partition evolution correctly, seems like we are using the wrong APIs for 
> partition evolution.
> For reference, Hive has the same ALTER TABLE SET PARTITION SPEC syntax, but 
> it is able to correctly create the new partition spec.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to