rdblue commented on pull request #2354: URL: https://github.com/apache/iceberg/pull/2354#issuecomment-810410893
@openinx, I think understand what you're saying: that if we track the schema for a snapshot then why not track the partition spec, sort order, and row identifier information? The reason is that the partition spec, sort order, and row identifier do not affect existing snapshots. Those are used when writing, not when reading. When reading, we always use whatever the data files are written with. A good example of this is partition spec. The table has a "default" partition spec that is used when writing new data, but existing data in the table was already written with one or more specs that don't change. Once a manifest is written using some partition spec, that spec is fixed and part of the metadata tree. We don't need to keep around the spec that was used to produce a given snapshot because it isn't useful. What is useful is what spec was used to write any given manifest, and that's encoded in the manifest's metadata. Sort order and row identifier fields are the same way. Once we've written data in some order, we attach the order and we don't need to know what the table's order was anymore. For delete file fields, we can use the table's row identifier fields but once we've done that we don't need to know what the table's configuration was any more. Schema is a bit different because we need to know what the "current" schema was in order to time-travel because we can change the schema and continue reading the same data files afterward. I think that we should not attach the row identifier fields to the schema, and should instead consider them table metadata that gets used on write, but is not needed for time travel. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
