Re: [PR] [SPEC] Add implementation note about schema evolution [iceberg]

via GitHub Fri, 05 Sep 2025 11:12:09 -0700


emkornfield commented on code in PR #13936:
URL: https://github.com/apache/iceberg/pull/13936#discussion_r2325742735



##########
format/spec.md:
##########
@@ -1861,6 +1861,16 @@ Java writes `-1` for "no current snapshot" with V1 and 
V2 tables and considers t
 
 Some implementations require that GZIP compressed files have the suffix 
`.gz.metadata.json` to be read correctly. The Java reference implementation can 
additionally read GZIP compressed files with the suffix `metadata.json.gz`.  
 
+### Schema evolution and writing with old schemas
+
+Writers must write out all fields with the types specified from a schema 
present in table metadata. Writers should use the latest schema for writing. 
Not writing out all columns or not using the latest schema can change the 
semantics of the data written. The following are possible inconsistencies that 
can be introduced:
+
+* For all null columns, not writing out the column would cause 
`initial-default` value would be applied on reading instead of `null`.
+* If `write-default` has been changed then using an out-of-date schema would 
result in the incorrect value being populated.
+* If a `write` is the result of a partial row update (e.g. `update table set 
col_y = 'xyz'`) an out-of-date schema would silently drop values.

Review Comment:
   If an old schema is used, then you implicitly end up dropping columns 
because you can't read columns you don't know about.  Thinking more about it, 
this should be unlikely to happen because you probably would have to replay the 
transaction anyways.  But effectively the sequence would be:
   
   1. Writer A writes new schema with added columns and new data for the added 
column.
   2. Writer B uses uses and old schema (this would have to happen strictly 
after step 1), and reads the new data, modifying an existing column.
   3. Writer B's updates would drop the new data from the added column.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] [SPEC] Add implementation note about schema evolution [iceberg]

Reply via email to