rdblue commented on a change in pull request #4301:
URL: https://github.com/apache/iceberg/pull/4301#discussion_r824299919
##########
File path: format/spec.md
##########
@@ -193,10 +193,38 @@ Notes:
For details on how to serialize a schema to JSON, see Appendix C.
+#### Default value
+Default values can be assigned to top-level columns or nested fields. Default
values are used during schema evolution when adding a new column. The default
value is used to read rows belonging to the files that lack the column or
nested field prior to the schema evolution.
Review comment:
Replying to @RussellSpitzer:
> For me this is a little odd because we are changing the underlying table
schema so I feel like the behavior shouldn't be implementation dependent.
100% agree that behavior cannot be implementation dependent!
> I feel like this is a little ambiguous, are writers required to use the
default value with optional columns or can different writers behave differently?
I think that optional vs required should be a separate concept and not
related to default values. Optional allows null values in the column, required
means that null cannot be in the column. Then I think we have an initial
default, which is the default value set when the column is added that is
applied to all existing rows. Then there is also a writer default, which can
change and is the value that a writer _must_ write for the column if it is not
supplied by the user. This write default can change because it affects future
writes and not existing data.
The place where optional and default values overlap is that we currently use
`null` as the default value for optional fields. That's why I would say that
optional fields have an implicit initial default of `null`. To handle this, I
think we would assume that the initial default value for an existing optional
field is `null` and only allow changing the write default value. When you
create an optional field you can set the initial default.
Here are some more concrete evolution rules with defaults:
* When creating a required field, an initial default value must be set.
Neither field values nor the default may be `null`.
* When creating an optional field, an initial default value can be set. If
the initial default value is not set, it is `null`.
* The initial default value may only be set when adding a field (or through
an incompatible change in the API)
* The write default value for a field starts as the initial default value,
and is considered set if the field has an initial default
* The write default value may be changed
* When writing, the write default must be written into data files if a value
for the column is not supplied
* When writing a field with no write default, the column must be supplied or
the write must fail
Does that make it clear?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]