wmoustafa commented on a change in pull request #4301:
URL: https://github.com/apache/iceberg/pull/4301#discussion_r824299518



##########
File path: format/spec.md
##########
@@ -193,10 +193,38 @@ Notes:
 
 For details on how to serialize a schema to JSON, see Appendix C.
 
+#### Default value
+Default values can be assigned to top-level columns or nested fields. Default 
values are used during schema evolution when adding a new column. The default 
value is used to read rows belonging to the files that lack the column or 
nested field prior to the schema evolution.

Review comment:
       Sorry I should have been clearer. When I said "The default value can be 
changed without any consequences", I meant in an ideal situation, when there is 
an immediate rewrite after the `ALTER TABLE t ADD col DEFAULT 1` which will 
materialize the `1` in the existing rows (similar to how DBMSs do). Assuming we 
are in this world (`1` is materialized immediately), if later `ALTER TABLE t 
ALTER col SET DEFAULT 3` takes place, then this can only affect future rows 
when `INSERT INTO` does not specify that column (since for past rows, `1` is 
already in the file).  So yes, I think we are saying the same thing which `SET 
DEFAULT` does not change the values of existing rows, but I was saying it would 
be the natural behavior even if we combine both concepts to one but assuming a 
rewrite takes place immediately.
   
   The main problem that I see with the two default values is that at the 
Iceberg API level, I expect both concepts have to be exposed somehow, at least 
because sometimes we say we can change it freely (for future rows) and 
sometimes we say we can only change it behind an `allowIncompatibleChanges` API 
(for existing rows). If we expose two ways or two concepts of default values, 
it might be unnecessarily complex.
   
   So if I were to choose between options, both of these sounds clean:
   1- Default value for `schema evolution` is the same as the one for `INSERT 
INTO`. None of them can change unless `allowIncompatibleChanges()` is called.
   2- Default value for `schema evolution` is the same as the one for `INSERT 
INTO`. `schema evolution` instance is materialized right away through full 
rewrite upon the DDL, allowing all other instances (i.e., `INSERT INTO`) to 
change. From the user's point of view Default value (regardless of which 
instance it is) just changes without problems.
   
   In both cases, we do not support the `INSERT INTO` use case yet.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to