syun64 commented on issue #6978:
URL: https://github.com/apache/iceberg/issues/6978#issuecomment-1451946473

   Hi @nastra really appreciate you taking the time to take a look at this PR. 
I was a bit confused at first, and it took some time to identify the issue and 
relate it to schema evolution, so here's an attempt at distilling that 
confusion at anyone else taking a look at this PR...
   
   Firstly, my understanding is that the concept of snapshot_id is consistent 
between the actual iceberg table, and its corresponding state of its metadata 
tables that represent it at a certain point in time. Time travel on metadata 
tables based on the snapshot_id is actually an advertised feature of Iceberg 
within its docs.
   
   Your observation that the schema of the filesTable didn't evolve and that 
that's the cause of the issue is absolutely correct. But I want to make a 
distinction between the schema_id of the actual iceberg table, the metadata 
table and their consistent snapshot_ids here. Here's a table describing the 
timeline of events that is described by my test:
   
   | event                  | create table | add row (id, data) | add column, 
add row (id, data, data2) |
   
|------------------------|--------------|--------------------|---------------------------------------|
   | snapshot id (random)   | 6234         | 9023               | 8234          
                        |
   | actual table schema id | 0            | 0                  | 1             
                        |
   | files table schema id  | 0            | 0                  | 0             
                        |
   
   The schema ID of the actual table does not change until there is a schema 
evolution. And this is consistent with @szehon-ho 's observation that there is 
no issue when you try to run time travel queries on the metadata table when 
there is no schema evolution, even when you add an extra row and increment the 
snapshot ID.
   
   However, when the schema evolves in the actual table, because #1508 makes a 
strong assumption that we are only looking at the schema ID of the actual 
table, we are using that schema ID to read the files metadata table, instead of 
using the schema ID of files table at that snapshot. Interesting thing is, that 
we are still able to query the files table for the first two snapshots in this 
series of events (6234 and 9023) even after the table has evolved, because the 
schema IDs within that snapshot are consistent across it's actual table and its 
metadata tables.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to