wmoustafa commented on a change in pull request #4301:
URL: https://github.com/apache/iceberg/pull/4301#discussion_r824299518
##########
File path: format/spec.md
##########
@@ -193,10 +193,38 @@ Notes:
For details on how to serialize a schema to JSON, see Appendix C.
+#### Default value
+Default values can be assigned to top-level columns or nested fields. Default
values are used during schema evolution when adding a new column. The default
value is used to read rows belonging to the files that lack the column or
nested field prior to the schema evolution.
Review comment:
Sorry I should have been clearer. When I said "The default value can be
changed without any consequences", I meant in an ideal situation, when there is
an immediate rewrite after the `ALTER TABLE t ADD col DEFAULT 1` which will
materialize the `1` in the existing rows (similar to how DBMSs do). Assuming we
are in this world (`1` is materialized immediately), if later `ALTER TABLE t
ALTER col SET DEFAULT 3` takes place, then this can only affect future rows
when `INSERT INTO` does not specify that column (since for past rows, `1` is
already in the file). So yes, I think we are saying the same thing which `SET
DEFAULT` does not change the values of existing rows, but I was saying it would
be the natural behavior even if we combine both concepts to one but assuming a
rewrite takes place immediately.
The main problem that I see here is that at the Iceberg API level, I expect
both concepts of default values have to be exposed, at least because sometimes
we say we can change it freely (for future rows) and sometimes we say we can
only change it behind an `allowIncompatibleChanges` API (for existing rows). If
we expose two ways or two concepts of default values, it might be unnecessarily
complex.
So if I were to choose between options, both of these sounds clean:
1- Default value for `schema evolution` is the same as the one for `INSERT
INTO`. None of them can change unless `allowIncompatibleChanges()` is called.
2- Default value for `schema evolution` is the same as the one for `INSERT
INTO`. `schema evolution` instance is materialized right away through full
rewrite upon the DDL, allowing all other instances (i.e., `INSERT INTO`) to
change. From the user's point of view Default value (regardless of which
instance it is) just changes without problems.
In both cases, we do not support the `INSERT INTO` use case yet.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]