rdblue commented on a change in pull request #4301:
URL: https://github.com/apache/iceberg/pull/4301#discussion_r824299919



##########
File path: format/spec.md
##########
@@ -193,10 +193,38 @@ Notes:
 
 For details on how to serialize a schema to JSON, see Appendix C.
 
+#### Default value
+Default values can be assigned to top-level columns or nested fields. Default 
values are used during schema evolution when adding a new column. The default 
value is used to read rows belonging to the files that lack the column or 
nested field prior to the schema evolution.

Review comment:
       Replying to @RussellSpitzer:
   
   > For me this is a little odd because we are changing the underlying table 
schema so I feel like the behavior shouldn't be implementation dependent.
   
   100% agree that behavior cannot be implementation dependent!
   
   > I feel like this is a little ambiguous, are writers required to use the 
default value with optional columns or can different writers behave differently?
   
   I think that optional vs required should be a separate concept and not 
related to default values. Optional allows null values in the column, required 
means that null cannot be in the column. Then I think we have an initial 
default, which is the default value set when the column is added that is 
applied to all existing rows. Then there is also a writer default, which can 
change and is the value that a writer _must_ write for the column if it is not 
supplied by the user. This write default can change because it affects future 
writes and not existing data.
   
   The place where optional and default values overlap is that we currently use 
`null` as the default value for optional fields. That's why I would say that 
optional fields have an implicit initial default of `null`. To handle this, I 
think we would assume that the initial default value for an existing optional 
field is `null` and only allow changing the write default value. When you 
create an optional field you can set the initial default.
   
   Here are some more concrete evolution rules with defaults:
   * When creating a required field, an initial default value must be set. 
Neither field values nor the default may be `null`.
   * When creating an optional field, an initial default value can be set. If 
the initial default value is not set, it is `null`.
   * The initial default value may only be set when adding a field (or through 
an incompatible change in the API)
   * The write default value for a field starts as the initial default value, 
and is considered set if the field has an initial default
   * The write default value may be changed
   * When writing, the write default must be written into data files if a value 
for the column is not supplied
   * When writing a field with no write default, the column must be supplied or 
the write must fail
   
   Does that make it clear?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to