rdblue commented on code in PR #14004: URL: https://github.com/apache/iceberg/pull/14004#discussion_r2462223623
########## format/spec.md: ########## @@ -1875,6 +1875,25 @@ Some implementations require that GZIP compressed files have the suffix `.gz.met Although the spec allows for including the deleted row itself (in addition to the path and position of the row in the data file) in v2 position delete files, writing the row is optional and no implementation currently writes it. The ability to write and read the row is supported in the Java implementation but is deprecated in version 1.11.0. +### Schema Evolution/Type Promotion + +Column projection rules are designed so that the table will remain readable even if writers use an outdated schema. At the beginning of a transaction Writers should load the latest schema (the schema referenced by `current-schema-id` from the latest table metadata) and use it for reading and writing data. Note, that in the common cases of schema evolution (adding nullable columns, adding required columns with an `initial-default`, renaming a column, dropping a column, or doing type promotion), appending data with outdated schemas presents no issues under either SNAPSHOT or SERIALIZABLE isolation levels + +However, the less common case of updating default values may need to be handled depending on isolation level. Consider two concurrent transactions: + +* **T1** modifies the `write-default` on the column. Review Comment: Does this cover the case with writers that don't produce a column with an initial-default? I'm thinking that you have a writer that is appending (maybe a streaming writer that doesn't update its schema) and a different writer adds a column with an initial-default that doesn't match the write-default. Then the stream writer will continue appending files that end up getting the initial default, rather than using the write default. I think this is probably subsumed by the write default case here, but I want to make sure we're aligned. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
