yihua commented on code in PR #10677:
URL: https://github.com/apache/hudi/pull/10677#discussion_r1508106526
##########
hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaCompatibility.java:
##########
@@ -283,6 +284,35 @@ private SchemaCompatibilityResult getCompatibility(final
Schema reader,
return result;
}
+ private static String getLocationName(final Deque<LocationInfo> locations,
Type readerType) {
+ StringBuilder sb = new StringBuilder();
+ Iterator<LocationInfo> locationInfoIterator = locations.iterator();
+ boolean addDot = false;
+ while (locationInfoIterator.hasNext()) {
+ if (addDot) {
+ sb.append(".");
+ } else {
+ addDot = true;
+ }
+ LocationInfo next = locationInfoIterator.next();
+ sb.append(next.name);
+ //we check the reader type if we are at the last location. This is
because
+ //if the type is array/map, that means the problem is that the field
type
+ //of the writer is not array/map. If the type is something else, the
problem
+ //is between the array element/map value of the reader and writer
schemas
Review Comment:
nit: would be good to give a couple of examples that illustrate the input
and expected output.
##########
hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaUtils.java:
##########
@@ -428,25 +408,95 @@ public static void checkSchemaCompatible(
boolean allowProjection,
Set<String> dropPartitionColNames) throws SchemaCompatibilityException {
- String errorMessage = null;
-
- if (!allowProjection && !canProject(tableSchema, writerSchema,
dropPartitionColNames)) {
- errorMessage = "Column dropping is not allowed";
+ if (!allowProjection) {
+ List<Schema.Field> missingFields = findMissingFields(tableSchema,
writerSchema, dropPartitionColNames);
+ if (!missingFields.isEmpty()) {
+ throw new
MissingSchemaFieldException(missingFields.stream().map(Schema.Field::name).collect(Collectors.toList()));
+ }
}
// TODO(HUDI-4772) re-enable validations in case partition columns
// being dropped from the data-file after fixing the write
schema
- if (dropPartitionColNames.isEmpty() && shouldValidate &&
!isSchemaCompatible(tableSchema, writerSchema)) {
- errorMessage = "Failed schema compatibility check";
+ if (dropPartitionColNames.isEmpty() && shouldValidate) {
+ AvroSchemaCompatibility.SchemaPairCompatibility result =
+ AvroSchemaCompatibility.checkReaderWriterCompatibility(writerSchema,
tableSchema, true);
+ if (result.getType() !=
AvroSchemaCompatibility.SchemaCompatibilityType.COMPATIBLE) {
+ throw new SchemaBackwardsCompatibilityException(result);
+ }
}
+ }
- if (errorMessage != null) {
- String errorDetails = String.format(
- "%s\nwriterSchema: %s\ntableSchema: %s",
- errorMessage,
- writerSchema,
- tableSchema);
- throw new SchemaCompatibilityException(errorDetails);
+ /**
+ * Validate whether the {@code incomingSchema} is a valid evolution of
{@code tableSchema}.
+ *
+ * @param incomingSchema schema of the incoming dataset
+ * @param tableSchema latest table schema
+ */
+ public static void checkValidEvolution(Schema incomingSchema, Schema
tableSchema) {
Review Comment:
Adding to that, my point is we should print both the specific context of
what's causing the incompatibility, and both schemas (writer schema, and table
schema). Former helps user to check the specific issue, and later helps users
see the whole schema if the user could not understand the specific issue, and
devs to check the bug in schema incompatibility check if that happens.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]