rdblue commented on pull request #2354:
URL: https://github.com/apache/iceberg/pull/2354#issuecomment-810410893


   @openinx, I think understand what you're saying: that if we track the schema 
for a snapshot then why not track the partition spec, sort order, and row 
identifier information?
   
   The reason is that the partition spec, sort order, and row identifier do not 
affect existing snapshots. Those are used when writing, not when reading. When 
reading, we always use whatever the data files are written with.
   
   A good example of this is partition spec. The table has a "default" 
partition spec that is used when writing new data, but existing data in the 
table was already written with one or more specs that don't change. Once a 
manifest is written using some partition spec, that spec is fixed and part of 
the metadata tree. We don't need to keep around the spec that was used to 
produce a given snapshot because it isn't useful. What is useful is what spec 
was used to write any given manifest, and that's encoded in the manifest's 
metadata.
   
   Sort order and row identifier fields are the same way. Once we've written 
data in some order, we attach the order and we don't need to know what the 
table's order was anymore. For delete file fields, we can use the table's row 
identifier fields but once we've done that we don't need to know what the 
table's configuration was any more.
   
   Schema is a bit different because we need to know what the "current" schema 
was in order to time-travel because we can change the schema and continue 
reading the same data files afterward.
   
   I think that we should not attach the row identifier fields to the schema, 
and should instead consider them table metadata that gets used on write, but is 
not needed for time travel.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to