[GitHub] [hudi] trushev commented on a diff in pull request #7895: [HUDI-5736] Common de-coupling column drop flag and schema validation flag

via GitHub Wed, 01 Mar 2023 22:40:07 -0800


trushev commented on code in PR #7895:
URL: https://github.com/apache/hudi/pull/7895#discussion_r1122654358



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -589,11 +589,11 @@ object HoodieSparkSqlWriter {
     if (isCompatibleProjectionOf(tableSchema, newSchema)) {
       // Picking table schema as a writer schema we need to validate that we'd 
be able to
       // rewrite incoming batch's data (written in new schema) into it
-      (tableSchema, isSchemaCompatible(newSchema, tableSchema, true))

Review Comment:
   I think we do. The check in `#deduceWriterSchema` is performed at creating 
rdd stage while `#validateSchema`  is part of writing stage. The first check 
allows us to identify schema incompatibility in advance



##########
hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaUtils.java:
##########
@@ -76,7 +87,18 @@ public static boolean isSchemaCompatible(Schema prevSchema, 
Schema newSchema, bo
    * @return true if prev schema is a projection of new schema.
    */
   public static boolean canProject(Schema prevSchema, Schema newSchema) {
+    return canProject(prevSchema, newSchema, Collections.emptySet());
+  }
+
+  /**
+   * Check that each field in the prevSchema can be populated in the newSchema 
except specified columns
+   * @param prevSchema prev schema.
+   * @param newSchema new schema
+   * @return true if prev schema is a projection of new schema.
+   */
+  public static boolean canProject(Schema prevSchema, Schema newSchema, 
Collection<String> exceptCols) {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] trushev commented on a diff in pull request #7895: [HUDI-5736] Common de-coupling column drop flag and schema validation flag

Reply via email to