[
https://issues.apache.org/jira/browse/IMPALA-12605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zoltán Borók-Nagy updated IMPALA-12605:
---------------------------------------
Description:
Impala's ALTER TABLE SET PARTITION SPEC reuses field ids of old partition specs.
This can result in having collisions of partition fields.
Repro:
{noformat}
CREATE TABLE ice_t (i int, p int) PARTITIONED BY SPEC (TRUNCATE(10, p)) STORED
BY ICEBERG;
ALTER TABLE ice_t SET PARTITION SPEC (TRUNCATE(100, p));
{noformat}
The latter ALTER TABLE statement will create another partition spec for the
table, but the partition field will have the same field id as the old partition
spec's field.
Workaround for this is to use the VOID transform:
{noformat}
ALTER TABLE ice_t SET PARTITION SPEC (VOID(p), TRUNCATE(100, p));
{noformat}
But Impala should automatically assign new partition field ids in the new spec.
This is especially true for Iceberg V2 tables, where last-partition-id is a
required field in the metadata. The Iceberg library should handle partition
evolution correctly, seems like we are using the wrong APIs for partition
evolution.
For reference, Hive has the same ALTER TABLE SET PARTITION SPEC syntax, but it
is able to correctly create the new partition spec.
was:
Impala's ALTER TABLE SET PARTITION SPEC reuses field ids of old partition specs.
This can result in having collisions of partition fields.
Repro:
{noformat}
CREATE TABLE ice_t (i int, p int) PARTITIONED BY SPEC (TRUNCATE(10, p)) STORED
BY ICEBERG;
ALTER TABLE ice_t SET PARTITION SPEC (TRUNCATE(100, p));
{noformat}
The latter ALTER TABLE statement will create another partition spec for the
table, but the partition field will have the same field id as the old partition
spec's field id.
Workaround for this is to use the VOID transform:
{noformat}
ALTER TABLE ice_t SET PARTITION SPEC (VOID(p), TRUNCATE(100, p));
{noformat}
But Impala should automatically assign new partition field ids in the new spec.
This is especially true for Iceberg V2 tables, where last-partition-id is a
required field in the metadata. The Iceberg library should handle partition
evolution correctly, seems like we are using the wrong APIs for partition
evolution.
For reference, Hive has the same ALTER TABLE SET PARTITION SPEC syntax, but it
is able to correctly create the new partition spec.
> ALTER TABLE SET PARTITION SPEC reuses field ids of old partition specs
> ----------------------------------------------------------------------
>
> Key: IMPALA-12605
> URL: https://issues.apache.org/jira/browse/IMPALA-12605
> Project: IMPALA
> Issue Type: Bug
> Reporter: Zoltán Borók-Nagy
> Priority: Major
> Labels: impala-iceberg
>
> Impala's ALTER TABLE SET PARTITION SPEC reuses field ids of old partition
> specs.
> This can result in having collisions of partition fields.
> Repro:
> {noformat}
> CREATE TABLE ice_t (i int, p int) PARTITIONED BY SPEC (TRUNCATE(10, p))
> STORED BY ICEBERG;
> ALTER TABLE ice_t SET PARTITION SPEC (TRUNCATE(100, p));
> {noformat}
> The latter ALTER TABLE statement will create another partition spec for the
> table, but the partition field will have the same field id as the old
> partition spec's field.
> Workaround for this is to use the VOID transform:
> {noformat}
> ALTER TABLE ice_t SET PARTITION SPEC (VOID(p), TRUNCATE(100, p));
> {noformat}
> But Impala should automatically assign new partition field ids in the new
> spec. This is especially true for Iceberg V2 tables, where last-partition-id
> is a required field in the metadata. The Iceberg library should handle
> partition evolution correctly, seems like we are using the wrong APIs for
> partition evolution.
> For reference, Hive has the same ALTER TABLE SET PARTITION SPEC syntax, but
> it is able to correctly create the new partition spec.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]