wmoustafa commented on a change in pull request #4301:
URL: https://github.com/apache/iceberg/pull/4301#discussion_r840467740



##########
File path: format/spec.md
##########
@@ -193,6 +193,17 @@ Notes:
 
 For details on how to serialize a schema to JSON, see Appendix C.
 
+#### Default value
+
+Default value can be assigned to a column when the column is added to an 
Iceberg table as part of the schema evolution. They are tracked at the level of 
a nested field inside a struct, thus it can be used for both top-level columns 
and nested columns. Iceberg tracks two default values internally: 
`initial-default` and `write-default`. The `initial-default` is used to read 
rows belonging to files that lack the column (i.e. the files were written 
before the column is added); the `write-default` value will be used for the 
automatically populating the column if user later inserts new rows without 
specifying the column.

Review comment:
       A couple of suggestions for the names: `file-to-record default` and 
`record-to-file default`, or `read-time default` and `write-time default`. I 
feel `file-to-record default` and `record-to-file default` are expressive, 
symmetric, and accurately capture the function of each default. We might 
explain that the former cannot be changed, while the latter can be changed 
throughout the lifecycle of a table. The former cannot be changed because it is 
used to read existing files while the latter can be changed because it is used 
to write new files.
   
   We might also start by explaining the semantics of default values from a 
_contract_ point of view:
   
   > A default value associated with a field is used to:
   > (1) populate the field's value for all records that were written before 
the field is introduced
   > (2) populate the field's value for any records that will be written after 
the field is introduced, when such records do not supply that field's value.
   > All fields introduced at table creation time (i.e., not part of schema 
evolution) only leverage the second use case. Fields introduced as part of a 
schema evolution can leverage both use cases.
   > 
   > Changes of default values apply to future records only (i.e., records 
written after the default value has changed). To prevent retroactive change of 
record values that are already written, each field is associated with two 
default values ... (then we can continue with the section above).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to