RussellSpitzer commented on a change in pull request #4301:
URL: https://github.com/apache/iceberg/pull/4301#discussion_r823176208



##########
File path: format/spec.md
##########
@@ -193,10 +193,38 @@ Notes:
 
 For details on how to serialize a schema to JSON, see Appendix C.
 
+#### Default value
+Default values can be assigned to top-level columns or nested fields. Default 
values are used during schema evolution when adding a new column. The default 
value is used to read rows belonging to the files that lack the column or 
nested field prior to the schema evolution.

Review comment:
       I think we have to hit a few points very clearly here.
   
   1. Default values can be assigned to any newly created column in an Iceberg 
Table
   2. Default values are only used for files which are missing the new column 
(Do we explicitly require the files to be written prior to the default value 
being set? If I import a new file directly into the iceberg table does it get 
the default? I think it's probably easiest to say yes)
   3. New writes to the table do not use Default Values
      a. Required Columns are still required at write time
      b. Optional Columns are still persisted as null if they are missing at 
write time
   4. Changing a default value is basically an undefined behavior
      a. Rows which have been rewritten will permanently keep the default value 
which was set when the row is rewritten
      b. Rows which have not been rewritten will adjust to the new default value
      c. Because of this we lock changing defaults behind "allowIncompatible 
changes"
   5. Name Mappings are always used in place of defaults. If a file is added 
with Iceberg Column Ids (via import or what not) we will always use name 
mapping before attempting to use a column default value.
      
    I think the current text is good but can probably be tightened up.
    
    For example
    
    "Default values can be assigned to any new top-level column added to an 
Iceberg table." instead of the first two sentences
    "Default Values are only used when reading a data file which is missing the 
column with the default.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to