emkornfield commented on code in PR #14266:
URL: https://github.com/apache/iceberg/pull/14266#discussion_r2539445555
##########
api/src/main/java/org/apache/iceberg/types/CheckCompatibility.java:
##########
@@ -38,7 +38,21 @@ public class CheckCompatibility extends
TypeUtil.CustomOrderSchemaVisitor<List<S
* @return a list of error details, or an empty list if there are no
compatibility problems
*/
public static List<String> writeCompatibilityErrors(Schema readSchema,
Schema writeSchema) {
- return writeCompatibilityErrors(readSchema, writeSchema, true);
+ return writeCompatibilityErrors(readSchema, writeSchema, true, 2);
+ }
+
+ /**
+ * Returns a list of compatibility errors for writing with the given write
schema. This includes
+ * nullability: writing optional (nullable) values to a required field is an
error.
+ *
+ * @param readSchema a read schema
+ * @param writeSchema a write schema
+ * @param formatVersion the table format version
+ * @return a list of error details, or an empty list if there are no
compatibility problems
+ */
+ public static List<String> writeCompatibilityErrors(
+ Schema readSchema, Schema writeSchema, int formatVersion) {
+ return writeCompatibilityErrors(readSchema, writeSchema, true,
formatVersion);
Review Comment:
I'm not sure if it is done in this code base but commenting the literal
`/*checkOrdering=*/true` can help readability.
##########
api/src/main/java/org/apache/iceberg/types/CheckCompatibility.java:
##########
@@ -53,7 +67,24 @@ public static List<String> writeCompatibilityErrors(Schema
readSchema, Schema wr
*/
public static List<String> writeCompatibilityErrors(
Schema readSchema, Schema writeSchema, boolean checkOrdering) {
- return TypeUtil.visit(readSchema, new CheckCompatibility(writeSchema,
checkOrdering, true));
+ return writeCompatibilityErrors(readSchema, writeSchema, checkOrdering, 2);
+ }
+
+ /**
+ * Returns a list of compatibility errors for writing with the given write
schema. This includes
+ * nullability: writing optional (nullable) values to a required field is an
error Optionally this
Review Comment:
```suggestion
* nullability: writing optional (nullable) values to a required field is
an error. This method allows
```
##########
core/src/main/java/org/apache/iceberg/schema/UnionByNameVisitor.java:
##########
@@ -204,7 +222,8 @@ private boolean isIgnorableTypeUpdate(Type existingType,
Type newType) {
// existingType:long -> newType:int returns true, meaning it is ignorable
// existingType:int -> newType:long returns false, meaning it is not
ignorable
return newType.isPrimitiveType()
- && TypeUtil.isPromotionAllowed(newType,
existingType.asPrimitiveType());
+ && TypeUtil.isPromotionAllowed(
+ newType, existingType.asPrimitiveType(), formatVersion, false);
Review Comment:
comment on why/what the false literal is here?
##########
api/src/main/java/org/apache/iceberg/types/CheckCompatibility.java:
##########
@@ -53,7 +67,24 @@ public static List<String> writeCompatibilityErrors(Schema
readSchema, Schema wr
*/
public static List<String> writeCompatibilityErrors(
Schema readSchema, Schema writeSchema, boolean checkOrdering) {
- return TypeUtil.visit(readSchema, new CheckCompatibility(writeSchema,
checkOrdering, true));
+ return writeCompatibilityErrors(readSchema, writeSchema, checkOrdering, 2);
+ }
+
+ /**
+ * Returns a list of compatibility errors for writing with the given write
schema. This includes
+ * nullability: writing optional (nullable) values to a required field is an
error Optionally this
+ * method allows case where input schema has different ordering than table
schema.
+ *
+ * @param readSchema a read schema
+ * @param writeSchema a write schema
+ * @param checkOrdering If false, allow input schema to have different
ordering than table schema
Review Comment:
nit: what is input schema? read schema?
##########
api/src/main/java/org/apache/iceberg/types/CheckCompatibility.java:
##########
@@ -38,7 +38,21 @@ public class CheckCompatibility extends
TypeUtil.CustomOrderSchemaVisitor<List<S
* @return a list of error details, or an empty list if there are no
compatibility problems
*/
public static List<String> writeCompatibilityErrors(Schema readSchema,
Schema writeSchema) {
- return writeCompatibilityErrors(readSchema, writeSchema, true);
+ return writeCompatibilityErrors(readSchema, writeSchema, true, 2);
Review Comment:
It seems like a potential foot-gun to leave methods that hardcode the
version default in place? Is this common practice in the code base (ideally
we would at least mark these methods as deprecated). Maybe as a follow-up if
this causes a lot of errors?
##########
core/src/main/java/org/apache/iceberg/SchemaUpdate.java:
##########
@@ -280,8 +301,17 @@ public UpdateSchema updateColumn(String name,
Type.PrimitiveType newType) {
return this;
}
+ // If field is listed in source-ids, we need to flag it for promoting date
-> timestamp.
+ List<PartitionField> partitionFields =
+ this.base != null
+ ? this.base.spec().getFieldsBySourceId(field.fieldId())
+ : Lists.newArrayList();
+
+ boolean isBucketPartitioned =
+ partitionFields.stream().anyMatch(pf ->
pf.transform().toString().startsWith("bucket["));
+
Preconditions.checkArgument(
- TypeUtil.isPromotionAllowed(field.type(), newType),
+ TypeUtil.isPromotionAllowed(field.type(), newType, formatVersion,
isBucketPartitioned),
Review Comment:
I wonder if a more stable API might be to pass through all transforms that
apply to the field (I guess bucket is the riskiest one). As long as this isn't
public we probably don't need to concern ourselves too much about it.
##########
api/src/main/java/org/apache/iceberg/types/CheckCompatibility.java:
##########
@@ -53,7 +67,24 @@ public static List<String> writeCompatibilityErrors(Schema
readSchema, Schema wr
*/
public static List<String> writeCompatibilityErrors(
Schema readSchema, Schema writeSchema, boolean checkOrdering) {
- return TypeUtil.visit(readSchema, new CheckCompatibility(writeSchema,
checkOrdering, true));
+ return writeCompatibilityErrors(readSchema, writeSchema, checkOrdering, 2);
Review Comment:
same comment on commenting literal (or maybe there are already defined
constants someplace in the code base)?
##########
api/src/main/java/org/apache/iceberg/types/CheckCompatibility.java:
##########
@@ -53,7 +67,24 @@ public static List<String> writeCompatibilityErrors(Schema
readSchema, Schema wr
*/
public static List<String> writeCompatibilityErrors(
Schema readSchema, Schema writeSchema, boolean checkOrdering) {
- return TypeUtil.visit(readSchema, new CheckCompatibility(writeSchema,
checkOrdering, true));
+ return writeCompatibilityErrors(readSchema, writeSchema, checkOrdering, 2);
+ }
+
+ /**
+ * Returns a list of compatibility errors for writing with the given write
schema. This includes
+ * nullability: writing optional (nullable) values to a required field is an
error Optionally this
+ * method allows case where input schema has different ordering than table
schema.
Review Comment:
```suggestion
* configuring whether different orderings between schema is considered an
error.
```
##########
core/src/main/java/org/apache/iceberg/SchemaUpdate.java:
##########
@@ -280,8 +301,17 @@ public UpdateSchema updateColumn(String name,
Type.PrimitiveType newType) {
return this;
}
+ // If field is listed in source-ids, we need to flag it for promoting date
-> timestamp.
+ List<PartitionField> partitionFields =
+ this.base != null
+ ? this.base.spec().getFieldsBySourceId(field.fieldId())
+ : Lists.newArrayList();
+
+ boolean isBucketPartitioned =
+ partitionFields.stream().anyMatch(pf ->
pf.transform().toString().startsWith("bucket["));
Review Comment:
is there not better utility methods for detecting this in an already parsed
form)?
##########
core/src/main/java/org/apache/iceberg/SchemaUpdate.java:
##########
@@ -65,27 +65,48 @@ class SchemaUpdate implements UpdateSchema {
private boolean allowIncompatibleChanges = false;
private Set<String> identifierFieldNames;
private boolean caseSensitive = true;
+ private final int formatVersion;
SchemaUpdate(TableOperations ops) {
this(ops, ops.current());
}
+ /** For testing only. */
+ SchemaUpdate(TableMetadata base) {
+ this(null, base, base.schema(), base.lastColumnId(), base.formatVersion());
+ }
+
/** For testing only. */
SchemaUpdate(Schema schema, int lastColumnId) {
- this(null, null, schema, lastColumnId);
+ this(null, null, schema, lastColumnId,
TableProperties.DEFAULT_FORMAT_VERSION);
Review Comment:
should this be marked as deprecated?
##########
api/src/main/java/org/apache/iceberg/types/CheckCompatibility.java:
##########
@@ -84,7 +121,13 @@ public static List<String> typeCompatibilityErrors(
* @return a list of error details, or an empty list if there are no
compatibility problems
*/
public static List<String> typeCompatibilityErrors(Schema readSchema, Schema
writeSchema) {
Review Comment:
same comment on versionless checks.
##########
core/src/main/java/org/apache/iceberg/schema/UnionByNameVisitor.java:
##########
@@ -53,7 +56,17 @@ private UnionByNameVisitor(UpdateSchema api, Schema
partnerSchema, boolean caseS
* @param newSchema a new schema to compare with the existing
*/
public static void visit(UpdateSchema api, Schema existingSchema, Schema
newSchema) {
- visit(api, existingSchema, newSchema, true);
+ visit(api, existingSchema, newSchema, true, 2);
Review Comment:
I'm thinking a literal for FORMAT_V2 would probably help readability.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]