[
https://issues.apache.org/jira/browse/HIVE-28136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17829518#comment-17829518
]
Butao Zhang edited comment on HIVE-28136 at 3/21/24 11:58 AM:
--------------------------------------------------------------
Hive alter primary key syntax needs to stored primary key columns and its alias
into hms :
The following Hive's style statement will store column *id* and its alias *pk1*
into HMS. Howerver, Iceberg won't need to store the identified field into hms
as it is not necessary, and that's what I have done in HIVE-28015
([https://github.com/apache/hive/pull/5047/files#diff-7c252c3ca08c9b9604de9899cf2637d9ea9fb5134eadc6c8549fa612c0f1b48dR272]).
This is a little like iceberg partitions which we don't store into hms.
{code:java}
alter table tbl_ice add constraint pk1 primary key (a) disable novalidate rely;
alter table tbl_ice drop constraint pk1;{code}
But if we want to follow hive's style alter primary key syntax, we should also
store iceberg identified field into hms, and we need to make sure the
identified field(primary key) on icenerg metadata files and hms is consistent.
Another way is to implement a new syntax like spark iceberg primary key syntax,
which can avoid storing iceberg primary key into hms.
[https://iceberg.apache.org/docs/latest/spark-ddl/#alter-table-set-identifier-fields]
{code:java}
ALTER TABLE prod.db.sample SET IDENTIFIER FIELDS id
-- single column
ALTER TABLE prod.db.sample SET IDENTIFIER FIELDS id, data
-- multiple columns
ALTER TABLE prod.db.sample DROP IDENTIFIER FIELDS id
-- single column
ALTER TABLE prod.db.sample DROP IDENTIFIER FIELDS id, data
-- multiple columns{code}
[~dkuzmenko] Do you think which way is better? Or any other better way we can
choose?
was (Author: zhangbutao):
Hive alter primary key syntax needs to stored primary columns and its alias
into hms :
The following Hive's style statement will store column *id* and its alias *pk1*
into HMS. Howerver, Iceberg won't need to store the identified field into hms
as it is not necessary, and that's what I have done in HIVE-28015
([https://github.com/apache/hive/pull/5047/files#diff-7c252c3ca08c9b9604de9899cf2637d9ea9fb5134eadc6c8549fa612c0f1b48dR272]).
This is a little like iceberg partitions which we don't store into hms.
{code:java}
alter table tbl_ice add constraint pk1 primary key (a) disable novalidate rely;
alter table tbl_ice drop constraint pk1;{code}
But if we want to follow hive's style alter primary key syntax, we should also
store iceberg identified field into hms, and we need to make sure the
identified field(primary key) on icenerg metadata files and hms is consistent.
Another way is to implement a new syntax like spark iceberg primary key syntax,
which can avoid storing iceberg primary key into hms.
[https://iceberg.apache.org/docs/latest/spark-ddl/#alter-table-set-identifier-fields]
{code:java}
ALTER TABLE prod.db.sample SET IDENTIFIER FIELDS id
-- single column
ALTER TABLE prod.db.sample SET IDENTIFIER FIELDS id, data
-- multiple columns
ALTER TABLE prod.db.sample DROP IDENTIFIER FIELDS id
-- single column
ALTER TABLE prod.db.sample DROP IDENTIFIER FIELDS id, data
-- multiple columns{code}
[~dkuzmenko] Do you think which way is better? Or any other better way we can
choose?
> Iceberg: Add support for altering PK
> ------------------------------------
>
> Key: HIVE-28136
> URL: https://issues.apache.org/jira/browse/HIVE-28136
> Project: Hive
> Issue Type: Improvement
> Reporter: Denys Kuzmenko
> Priority: Major
>
> # Allow schema evolution affecting identifier-field-ids (e.g. rename or drop
> a column that is also a PK)
> # Allow changing the identifier-field-ids once the table is created (e.g. add
> or remove a column to the PK)
> Hive already supports those commands, we just need to update
> `identifier-field-ids`
> {code}
> alter table tbl_ice add constraint pk1 primary key (a) disable novalidate
> rely;
> alter table tbl_ice drop constraint pk1;
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)