RussellSpitzer commented on code in PR #4301: URL: https://github.com/apache/iceberg/pull/4301#discussion_r858004222
########## format/spec.md: ########## @@ -194,6 +194,19 @@ Notes: For details on how to serialize a schema to JSON, see Appendix C. +#### Default values + +Default values can be tracked for struct fields (both nested structs and the top-level schema's struct). There can be two defaults with a field: +- `initial-default` is used to populate the field's value for all records that were written before the field was added to the schema +- `write-default` is used to populate the field's value for any records written after the field was added to the schema, if the writer does not supply the field's value + +The `initial-default` is set only when a field is added to an existing schema. The `write-default` is initially set to the same value as `initial-default` and can be changed through schema evolution. If either default is not set for an optional field, then the default value is null for compatibility with older spec versions. + +Together, the `initial-default` and `write-default` produce SQL default value behavior without rewriting data files. That is, default value changes may only affect future records and all known fields are written into data files. To produce this behavior, omitting a known field when writing a data file is not allowed. The write default for a field must be written if a field is not supplied to a write. If the write default for a required field is not set, the writer must fail. Review Comment: I think this is a bit confusing because of the ordering of sentences. Perhaps something like "The ANSI(?) SQL default value behavior treats a new column with a default value as if all previous rows missing that field now have the default value. The default is allowed to be changed for new writes but changing the default does not effect earlier writes. To achieve this behavior in Iceberg, omitting a known field when writing a new data file is never allowed. The write default must be used when writing any new files if a value for the default field is not provided. If the field is required and it is not supplied and there are is no default available, the write must fail." I'm a little confused on allowing the default for required fields and then allowing writers not to supply them. Isn't this supposed to be behavior for an Optional field which has not been set? Maybe an example on the difference between an Optional field with a write default and a required field with a write default? Sorry if I missed this discussion but I'm a little confused on the difference between Optional w/Default and Required w/Default -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
