Re: [PR] [HUDI-7413] Fix schema exception types and error messages thrown with schema exceptions [hudi]

via GitHub Thu, 29 Feb 2024 11:49:04 -0800


yihua commented on code in PR #10677:
URL: https://github.com/apache/hudi/pull/10677#discussion_r1508106526



##########
hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaCompatibility.java:
##########
@@ -283,6 +284,35 @@ private SchemaCompatibilityResult getCompatibility(final 
Schema reader,
       return result;
     }
 
+    private static String getLocationName(final Deque<LocationInfo> locations, 
Type readerType) {
+      StringBuilder sb = new StringBuilder();
+      Iterator<LocationInfo> locationInfoIterator = locations.iterator();
+      boolean addDot = false;
+      while (locationInfoIterator.hasNext()) {
+        if (addDot) {
+          sb.append(".");
+        } else {
+          addDot = true;
+        }
+        LocationInfo next = locationInfoIterator.next();
+        sb.append(next.name);
+        //we check the reader type if we are at the last location. This is 
because
+        //if the type is array/map, that means the problem is that the field 
type
+        //of the writer is not array/map. If the type is something else, the 
problem
+        //is between the array element/map value of the reader and writer 
schemas

Review Comment:
   nit: would be good to give a couple of examples that illustrate the input 
and expected output.



##########
hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaUtils.java:
##########
@@ -428,25 +408,95 @@ public static void checkSchemaCompatible(
       boolean allowProjection,
       Set<String> dropPartitionColNames) throws SchemaCompatibilityException {
 
-    String errorMessage = null;
-
-    if (!allowProjection && !canProject(tableSchema, writerSchema, 
dropPartitionColNames)) {
-      errorMessage = "Column dropping is not allowed";
+    if (!allowProjection) {
+      List<Schema.Field> missingFields = findMissingFields(tableSchema, 
writerSchema, dropPartitionColNames);
+      if (!missingFields.isEmpty()) {
+        throw new 
MissingSchemaFieldException(missingFields.stream().map(Schema.Field::name).collect(Collectors.toList()));
+      }
     }
 
     // TODO(HUDI-4772) re-enable validations in case partition columns
     //                 being dropped from the data-file after fixing the write 
schema
-    if (dropPartitionColNames.isEmpty() && shouldValidate && 
!isSchemaCompatible(tableSchema, writerSchema)) {
-      errorMessage = "Failed schema compatibility check";
+    if (dropPartitionColNames.isEmpty() && shouldValidate) {
+      AvroSchemaCompatibility.SchemaPairCompatibility result =
+          AvroSchemaCompatibility.checkReaderWriterCompatibility(writerSchema, 
tableSchema, true);
+      if (result.getType() != 
AvroSchemaCompatibility.SchemaCompatibilityType.COMPATIBLE) {
+        throw new SchemaBackwardsCompatibilityException(result);
+      }
     }
+  }
 
-    if (errorMessage != null) {
-      String errorDetails = String.format(
-          "%s\nwriterSchema: %s\ntableSchema: %s",
-          errorMessage,
-          writerSchema,
-          tableSchema);
-      throw new SchemaCompatibilityException(errorDetails);
+  /**
+   * Validate whether the {@code incomingSchema} is a valid evolution of 
{@code tableSchema}.
+   *
+   * @param incomingSchema schema of the incoming dataset
+   * @param tableSchema latest table schema
+   */
+  public static void checkValidEvolution(Schema incomingSchema, Schema 
tableSchema) {

Review Comment:
   Adding to that, my point is we should print both the specific context of 
what's causing the incompatibility, and both schemas (writer schema, and table 
schema).  Former helps user to check the specific issue, and later helps users 
see the whole schema if the user could not understand the specific issue, and 
devs to check the bug in schema incompatibility check if that happens.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-7413] Fix schema exception types and error messages thrown with schema exceptions [hudi]

Reply via email to