rdblue commented on code in PR #4301: URL: https://github.com/apache/iceberg/pull/4301#discussion_r841926587
########## format/spec.md: ########## @@ -193,6 +193,17 @@ Notes: For details on how to serialize a schema to JSON, see Appendix C. +#### Default value + +Default value can be assigned to a column when the column is added to an Iceberg table as part of the schema evolution. They are tracked at the level of a nested field inside a struct, thus it can be used for both top-level columns and nested columns. Iceberg tracks two default values internally: `initial-default` and `write-default`. The `initial-default` is used to read rows belonging to files that lack the column (i.e. the files were written before the column is added); the `write-default` value will be used for the automatically populating the column if user later inserts new rows without specifying the column. Review Comment: I really like the explanation of what default values are used for, although I think we need to be specific about which default value is used. The wording "`initial-default` is used to populate the field's value for all records that were written before the field was introduced" is great. I think that's more clear than my version. I would be a bit more strict with the wording about changes. In the spec, there are two default values and we need to specify how each one behaves. We can also have a paragraph explaining how to make these appear like a single default to users, but we can't combine the two values in the spec. I would update to this: > Default values can be tracked for struct fields (both nested structs and the top-level schema's struct). There can be two defaults associated with a field: > 1. `initial-default` is used to populate the field's value for all records that were written before the field was introduced > 2. `write-default` is used to populate the field's value for any records written after the field is introduced, when the writer does not supply the field's value > > Note that all schema fields are required when writing data into a table. Omitting a known field from a data file is not allowed. The write default for a field should be written when a field is not supplied to a write. If the write default is not set, the writer must fail. > > The `initial-default` is set only when a field is added to an existing schema. The `write-default` is initially set to the same value as `initial-default` and can be changed through schema evolution. > > Together, the `initial-default` and `write-default` produce SQL default value behavior without rewriting data files. That is, changes to default values apply to future records only and all known fields are written into data files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
