[GitHub] [iceberg] openinx commented on pull request #2354: Core: add row key to format v2

GitBox Sun, 28 Mar 2021 20:06:40 -0700


openinx commented on pull request #2354:
URL: https://github.com/apache/iceberg/pull/2354#issuecomment-809033279



   I think we should keep trace of multiple version in apache iceberg schema, 
let's discuss the case you described: adding profile_id to table previously 
identified by only account_id. 
   
   t1:   User defines the `account_id` as the row identifier; 
   t2:   Write few records into the table; 
   t3:   Write few equality deletions (by `account_id`) into table; 
   t4:   Adding `profile_id` to row identifier, now the identifier is 
`account_id` & `profile_id`; 
   t5:   Write few equality deletions ( by `account_id` & `profile_id`) into 
table;
   
   In my option,  the iceberg table format's row identifier specification is 
introduced because we expect the standard SQL's `PRIMARY KEY` could be mapped 
to those row identifier columns automatically 
   
   (
   if we don't have the row identifier spec then we don't know how to track 
those keys when create table like: 
   
   ```sql
   CREATE TABLE sample(id INT, data STRING,  PRIMARY KEY (id) NOT ENFORCED);
   ```
   )
   
   Back to the above case,  at the timestamp `t4` & `t5`,  the table's row 
identifier is `account_id` & `profile_id`.  If people want to read the snapshot 
at timestamp `t3`, then we should use the row identifier `account_id`.  So if 
we don't track the multiple version of identifier,  How could we read the row 
identifier from old snapshots ?  If use the latest `account_id` & `profile_id`, 
 that seems confuse people a lot because those rows are deleted only by field 
`account_id` actually.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] openinx commented on pull request #2354: Core: add row key to format v2

Reply via email to