wmoustafa commented on a change in pull request #4301:
URL: https://github.com/apache/iceberg/pull/4301#discussion_r825009591
##########
File path: format/spec.md
##########
@@ -193,10 +193,38 @@ Notes:
For details on how to serialize a schema to JSON, see Appendix C.
+#### Default value
+Default values can be assigned to top-level columns or nested fields. Default
values are used during schema evolution when adding a new column. The default
value is used to read rows belonging to the files that lack the column or
nested field prior to the schema evolution.
Review comment:
> I'm not sure what you mean that this "has the side effect of not
distinguishing between organic 34 and 34 from schema evolution"
That was a caveat if the user uses `UPDATE` statement to update he default
value (which is an alternative way if dropping and recreating the column is not
feasible anymore due to writing more values). So after using code snippet (1),
and inserting more data, users can also change the default value by issuing an
UPDATE DML, but it will not only update the rows that were present before
schema evolution, but also other rows after schema evolution that happened to
store the same value as the default value.
> That works because the use case is that you've made a mistake and need to
correct it _before writing any data_.
Agreed. It was not initially clear if the suggestion also applied to the
case when more data could have been written. I guess this is where the `UPDATE`
route above can be helpful.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]