rdblue commented on a change in pull request #4301:
URL: https://github.com/apache/iceberg/pull/4301#discussion_r823218816



##########
File path: format/spec.md
##########
@@ -193,10 +193,38 @@ Notes:
 
 For details on how to serialize a schema to JSON, see Appendix C.
 
+#### Default value
+Default values can be assigned to top-level columns or nested fields. Default 
values are used during schema evolution when adding a new column. The default 
value is used to read rows belonging to the files that lack the column or 
nested field prior to the schema evolution.

Review comment:
       1. Agreed
   2. Yes, probably use the default for imported files. But we should state 
that fields must be written even if there is a default value.
   3. I would state this slightly differently:
       * All columns must be written in new data files at write time
       * Optional columns may contain null, required columns must not
       * Optional columns with no default value have an implied default value, 
`null`
       * Writers are allowed to write the default value if there is no data for 
a column
   4. Changing the default value should not be allowed by the spec, similar to 
how you cannot add required columns to the spec currently. I don't think that 
documenting "allow incompatible changes" is a good idea -- that's just an 
implementation detail if you know what you're doing. Also, we will need to 
clarify that you may now add an optional column, an optional column with a 
default value, or a required column with a default value. But you cannot add a 
required column with no default value.
   5. Agreed, but I would phrase it differently again. Name mappings are used 
to apply IDs for data files. Default values are added after that point, when 
the data file has its IDs.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to