yyanyy commented on pull request #2275:
URL: https://github.com/apache/iceberg/pull/2275#issuecomment-813861859


   > @yyanyy thank you for responding to my question. But because I am slow, I 
needed to do some testing to understand for myself the effects of #2096 and 
this PR.
   > I started with Iceberg 0.11.0 and created some tables, inserting data into 
them, and altering the schema. Then I added the commit for #2096 and the 
changes here on top (actually I added the commit for #2089 first, so that the 
backports are clean). I made further updates to my Iceberg tables, adding data 
and altering the schema.
   > From what I see, when the table is updated and a new metadata.json file 
for the table is written, the metadata.json file still has a `schema` field, 
but the schema now has a `schema-id` field; it also has a `schemas` field 
containing an array of schemas, and a `current-schema-id` field. The only 
schema known at the time of the switchover is the current schema of the table, 
so if the table change is just change to data, the existing schema is given a 
`schema-id` of 0, and the schemas array contains a single schema; if the table 
change is a schema change, then the old schema is given a `schema-id` of 0, the 
new schema is given a `schema-id` of 1, and the `schemas` array contains two 
schemas (and the `current-schema-id` field is 1). In the `snapshots` field, any 
snapshot created before the switchover does not have a `schema-id`; snapshots 
created after the switchover are given a `schema-id`, corresponding to the 
schema current at the time the snapshot is written.
   > When `Snapshot#schemaId()` is called, if the snapshot is written before 
the switchover, null is returned, and if written after the switchover, a 
non-null Integer is returned.
   > It is as you wrote.
   > With this change, I think it is straightforward to update my PR, #1508. I 
just need to update the following method I'm adding to `BaseTable`
   > 
   > ```
   >   public Schema schemaForSnapshot(long snapshotId) {
   >     TableMetadata current = ops.current();
   >     // First, check if the snapshot has a schema id associated with it
   >     Snapshot snapshot = current.snapshot(snapshotId);
   >     Integer schemaId = snapshot.schemaId();
   >     if (schemaId != null) {
   >       return current.schemasById().get(schemaId);
   >     }
   >     // Otherwise, read each of the previous metadata files until we find 
one whose current
   >     // snapshot id is the snapshot id
   >     ...
   >   }
   > ```
   > 
   > I rebuilt Iceberg with my change on top of the previous changes, and I was 
able to see that the correct schema is used when viewing any snapshot (time 
travel).
   > When this change is merged, I will update my PR. I hope that it can be 
considered then.
   
   Thank you for spending time verifying the changes, and described the steps 
here! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to