Re: [PR] [SPEC] New revision on schema evolution [iceberg]

via GitHub Wed, 17 Sep 2025 14:13:47 -0700


emkornfield commented on code in PR #14004:
URL: https://github.com/apache/iceberg/pull/14004#discussion_r2356795445



##########
format/spec.md:
##########
@@ -1861,6 +1861,18 @@ Java writes `-1` for "no current snapshot" with V1 and 
V2 tables and considers t
 
 Some implementations require that GZIP compressed files have the suffix 
`.gz.metadata.json` to be read correctly. The Java reference implementation can 
additionally read GZIP compressed files with the suffix `metadata.json.gz`.  
 
+### Schema Evolution/Type Promotion
+
+Column projection rules are designed so that the table will remain readable 
even if writers use an outdated schema. At the beginning of a transaction 
Writers should load the latest schema (the schema pointed to by 
`current-schema-id` from the latest table metadata) and use it for reading and 
writing data.  Note, that in the common cases of schema evolution (adding 
nullable columns, adding required columns with an `initial-default`, renaming a 
column, dropping a column, or doing type promotion) then appending data with 
outdated schemas presents no issues under either SNAPSHOT or SERIALIZABLE 
isolation levels.
+
+While writers are not required to bind to the latest schema there are edge 
cases to consider:
+
+1. Assume two transactions that are started concurrently. The first modifies 
the `write-default` on the column. The second is a data write that makes use of 
`write-default` from the changed column in the first transaction. If the first 
transaction gets committed first, the result of the second transaction depends 
on isolation level. Under SNAPSHOT isolation the second transaction can be 
committed. However, the second transaction produces the serialization anomaly 
of using the outdated `write-default` default value.  SERIALIZABLE isolation 
does not allow for such anomolies and the second transaction must fail in this 
mode. In this scenario, the transaction could be retried after updating to the 
new schema and rewriting the data using the new `write-default`. To generalize, 
for SERIALIZABLE isolation, writers must confirm that the schema did not have a 
change to a `write-default` value, this can be confirmed in two stages: First 
check if there was any schema change (for the REST catal
 og this can be done with `assert-ref-snapshot-id`); Second, if the schema 
changed determine if there was a change to a `write-default` value used in the 
transaction. No check for schema changes is needed For SNAPSHOT isolation.

Review Comment:
   tried to reformat/reword with these ideas, let me know if this looks better



##########
format/spec.md:
##########
@@ -1861,6 +1861,18 @@ Java writes `-1` for "no current snapshot" with V1 and 
V2 tables and considers t
 
 Some implementations require that GZIP compressed files have the suffix 
`.gz.metadata.json` to be read correctly. The Java reference implementation can 
additionally read GZIP compressed files with the suffix `metadata.json.gz`.  
 
+### Schema Evolution/Type Promotion
+
+Column projection rules are designed so that the table will remain readable 
even if writers use an outdated schema. At the beginning of a transaction 
Writers should load the latest schema (the schema pointed to by 
`current-schema-id` from the latest table metadata) and use it for reading and 
writing data.  Note, that in the common cases of schema evolution (adding 
nullable columns, adding required columns with an `initial-default`, renaming a 
column, dropping a column, or doing type promotion) then appending data with 
outdated schemas presents no issues under either SNAPSHOT or SERIALIZABLE 
isolation levels.

Review Comment:
   I'm not sure this is relevant outside of the isolation level cases?



##########
format/spec.md:
##########
@@ -1861,6 +1861,18 @@ Java writes `-1` for "no current snapshot" with V1 and 
V2 tables and considers t
 
 Some implementations require that GZIP compressed files have the suffix 
`.gz.metadata.json` to be read correctly. The Java reference implementation can 
additionally read GZIP compressed files with the suffix `metadata.json.gz`.  
 
+### Schema Evolution/Type Promotion
+
+Column projection rules are designed so that the table will remain readable 
even if writers use an outdated schema. At the beginning of a transaction 
Writers should load the latest schema (the schema pointed to by 
`current-schema-id` from the latest table metadata) and use it for reading and 
writing data.  Note, that in the common cases of schema evolution (adding 
nullable columns, adding required columns with an `initial-default`, renaming a 
column, dropping a column, or doing type promotion) then appending data with 
outdated schemas presents no issues under either SNAPSHOT or SERIALIZABLE 
isolation levels.
+
+While writers are not required to bind to the latest schema there are edge 
cases to consider:

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] [SPEC] New revision on schema evolution [iceberg]

Reply via email to