wmoustafa commented on a change in pull request #4301:
URL: https://github.com/apache/iceberg/pull/4301#discussion_r824020847



##########
File path: format/spec.md
##########
@@ -193,10 +193,38 @@ Notes:
 
 For details on how to serialize a schema to JSON, see Appendix C.
 
+#### Default value
+Default values can be assigned to top-level columns or nested fields. Default 
values are used during schema evolution when adding a new column. The default 
value is used to read rows belonging to the files that lack the column or 
nested field prior to the schema evolution.

Review comment:
       This situation emerges when engines want to support `INSERT INTO` with a 
subset of columns. Note that the columns must already exist. Now, the `schema 
evolution` default value may or may not exist (depending on whether the column 
missing in `INSERT INTO` was added after the table was created (i.e., schema 
evolution) or existed since the table was first created (i.e., no schema 
evolution), respectively).
   
   If the `schema evolution` default value exists: I think it is fair to  
_reuse the value_ to fill in the missing column values in `INSERT INTO`.
   
   If the `schema evolution` default value does not exists: I think it is fair 
to  _reuse the current place in the metadata_ to define this default value 
(upon table creation this time, and not upon schema evolution). This will never 
conflict with the `schema evolution` default value of the same column because 
the column _already exists_.
   
   Going by the above, I think it is fine to use the same concept for the 
default value to serve both use cases.
   
   For the current spec, we may state that we do not support that anyways, and 
hence default value cannot be leveraged on the write path. It can be a future 
extension though.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to