danielcweeks commented on code in PR #14004:
URL: https://github.com/apache/iceberg/pull/14004#discussion_r2461677807


##########
format/spec.md:
##########
@@ -1859,7 +1859,28 @@ Java writes `-1` for "no current snapshot" with V1 and 
V2 tables and considers t
 
 ### Naming for GZIP compressed Metadata JSON files
 
-Some implementations require that GZIP compressed files have the suffix 
`.gz.metadata.json` to be read correctly. The Java reference implementation can 
additionally read GZIP compressed files with the suffix `metadata.json.gz`.  
+Some implementations require that GZIP compressed files have the suffix 
`.gz.metadata.json` to be read correctly. The Java reference implementation can 
additionally read GZIP compressed files with the suffix `metadata.json.gz`.
+
+### Schema Evolution/Type Promotion
+
+Column projection rules are designed so that the table will remain readable 
even if writers use an outdated schema. At the beginning of a transaction 
Writers should load the latest schema (the schema referened by 
`current-schema-id` from the latest table metadata) and use it for reading and 
writing data.  Note, that in the common cases of schema evolution (adding 
nullable columns, adding required columns with an `initial-default`, renaming a 
column, dropping a column, or doing type promotion), appending data with 
outdated schemas presents no issues under either SNAPSHOT or SERIALIZABLE 
isolation levels
+
+However, the less common case of updating default values may need to be 
handled depending on isolation level. Consider two concurrent transactions:
+
+* **T1** modifies the `write-default` on the column.
+* **T2** writes data that makes use of `write-default` from the changed column 
in the first transaction.
+
+If the **T1** commits before **T2** then handling **T2** depends on isolation 
level.
+
+* **SNAPSHOT**: **T2** may be commited even though it used the old 
`write-default` (this is a permitted serialization anomaly).
+* **SERIALIZABLE**: **T2** must abort.
+
+When a transaction is aborted, the transaction could be retried after updating 
to the new schema and rewriting the data using the new `write-default`. One way 
of ensuring SERIALIZABLE isolation is a two phased approach:
+
+1. Check if there was a schema change (for the REST catalog this can be done 
with `assert-current-schema-id`) when committing.

Review Comment:
   Do we need the additional assertion?  If the snapshot being committed 
matches the current schema, would that suffice?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to