[GitHub] [iceberg] rdblue commented on a diff in pull request #4301: Docs: Default value support feature specification

GitBox Mon, 04 Apr 2022 09:23:15 -0700


rdblue commented on code in PR #4301:
URL: https://github.com/apache/iceberg/pull/4301#discussion_r841926587



##########
format/spec.md:
##########
@@ -193,6 +193,17 @@ Notes:
 
 For details on how to serialize a schema to JSON, see Appendix C.
 
+#### Default value
+
+Default value can be assigned to a column when the column is added to an 
Iceberg table as part of the schema evolution. They are tracked at the level of 
a nested field inside a struct, thus it can be used for both top-level columns 
and nested columns. Iceberg tracks two default values internally: 
`initial-default` and `write-default`. The `initial-default` is used to read 
rows belonging to files that lack the column (i.e. the files were written 
before the column is added); the `write-default` value will be used for the 
automatically populating the column if user later inserts new rows without 
specifying the column.

Review Comment:
   I really like the explanation of what default values are used for, although 
I think we need to be specific about which default value is used. The wording 
"`initial-default` is used to populate the field's value for all records that 
were written before the field was introduced" is great. I think that's more 
clear than my version.
   
   I would be a bit more strict with the wording about changes. In the spec, 
there are two default values and we need to specify how each one behaves. We 
can also have a paragraph explaining how to make these appear like a single 
default to users, but we can't combine the two values in the spec.
   
   I would update to this:
   
   > Default values can be tracked for struct fields (both nested structs and 
the top-level schema's struct). There can be two defaults associated with a 
field:
   > 1. `initial-default` is used to populate the field's value for all records 
that were written before the field was introduced
   > 2. `write-default` is used to populate the field's value for any records 
written after the field is introduced, when the writer does not supply the 
field's value
   >
   > Note that all schema fields are required when writing data into a table. 
Omitting a known field from a data file is not allowed. The write default for a 
field should be written when a field is not supplied to a write. If the write 
default is not set, the writer must fail.
   >
   > The `initial-default` is set only when a field is added to an existing 
schema. The `write-default` is initially set to the same value as 
`initial-default` and can be changed through schema evolution.
   >
   > Together, the `initial-default` and `write-default` produce SQL default 
value behavior without rewriting data files. That is, changes to default values 
apply to future records only and all known fields are written into data files.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a diff in pull request #4301: Docs: Default value support feature specification

Reply via email to