rdblue commented on a change in pull request #4301:
URL: https://github.com/apache/iceberg/pull/4301#discussion_r840030461



##########
File path: format/spec.md
##########
@@ -193,6 +193,17 @@ Notes:
 
 For details on how to serialize a schema to JSON, see Appendix C.
 
+#### Default value
+
+Default value can be assigned to a column when the column is added to an 
Iceberg table as part of the schema evolution. They are tracked at the level of 
a nested field inside a struct, thus it can be used for both top-level columns 
and nested columns. Iceberg tracks two default values internally: 
`initial-default` and `write-default`. The `initial-default` is used to read 
rows belonging to files that lack the column (i.e. the files were written 
before the column is added); the `write-default` value will be used for the 
automatically populating the column if user later inserts new rows without 
specifying the column.

Review comment:
       Beginning when "assigned ... when ..." is not giving context about what 
field defaults are. This should begin with a high-level explanation and then go 
into detail. Also, this is the Iceberg spec so it doesn't make sense to refer 
to what "Iceberg tracks". Instead, this should state what is required.
   
   > Default values can be tracked for struct fields (both nested structs and 
the top-level schema's struct). There are two defaults for a field:
   > * `initial-default` is a value that must be projected when reading a data 
file that was written before the field was added to the schema
   > * `write-default` is a value that must be written for all rows when the 
field is missing from input data while writing a new data file
   >
   > Note that all schema fields are required when writing data into a table. 
Omitting a known field from a data file is not allowed. The write default for a 
field should be written when a field is not supplied to a write. If the write 
default is not set, the writer must fail.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to