Verest commented on code in PR #14052:
URL: https://github.com/apache/iceberg/pull/14052#discussion_r2342260052
##########
api/src/test/java/org/apache/iceberg/TestSchema.java:
##########
@@ -213,7 +213,7 @@ public void testSupportedInitialDefault(int formatVersion) {
@ParameterizedTest
@FieldSource("org.apache.iceberg.TestHelpers#ALL_VERSIONS")
public void testSupportedWriteDefault(int formatVersion) {
- // only the initial default is a forward-incompatible change
+ // TODO: need to clarify if this needs to be updated to block for <v3
assertThatCode(() -> Schema.checkCompatibility(WRITE_DEFAULT_SCHEMA,
formatVersion))
.doesNotThrowAnyException();
}
Review Comment:
I don't understand why validation was not added for throwing if a
`write-default` was detected within a v1/v2 table's schema. There is some
reasoning around forward compatibility listed
[here](https://github.com/apache/iceberg/pull/11434) for `initial-default`.
I understand the forward-incompatible comment with respect to reads, but
isn't the `write-default` forward-incompatible with respect to writes? E.g. a
v2 engine will not correctly write a query result for a non-provided field that
should default to an existing `write-default`.
Related from the spec: "The write-default is a forward-compatible change
because it is only used at write time. Old writers will fail because the field
is missing." Do old writers not just default the value to null? E.g. `INSERT
INTO <table> (id) VALUES (1)` will put `null` for any non-id column, which in
this case would be incorrect if a `write-default` was set for those columns.
- Edit: this is not the behavior for spark, it throws for missing columns.
Am I correct in that `write-default` validation should be added? If not,
does `Schema.checkCompatibility` only consider reads, am I missing context, etc?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]