rdblue commented on a change in pull request #4301:
URL: https://github.com/apache/iceberg/pull/4301#discussion_r823926347
##########
File path: format/spec.md
##########
@@ -193,10 +193,38 @@ Notes:
For details on how to serialize a schema to JSON, see Appendix C.
+#### Default value
+Default values can be assigned to top-level columns or nested fields. Default
values are used during schema evolution when adding a new column. The default
value is used to read rows belonging to the files that lack the column or
nested field prior to the schema evolution.
Review comment:
@wmoustafa and @RussellSpitzer, thinking about this a bit more, I think
we may want to have a second default in the format (but not exposed to users).
The first default is immutable and is set when a field is created. That tells
Iceberg how to fill in the default value for all files that don't contain it
(`id=1` above). The second default would be the one that is used when writing
if the column isn't supplied (`id=3` above). The default for existing data
files never needs to change, but the default for future writes can change at
any time.
What do you think? This only needs to be visible at the format layer. Users
would set a default when creating a column and could change it later to affect
writes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]