[GitHub] [hudi] trushev commented on a diff in pull request #7895: [HUDI-5736] De-coupling column drop flag and schema validation flag in Flink

via GitHub Thu, 09 Feb 2023 03:41:04 -0800


trushev commented on code in PR #7895:
URL: https://github.com/apache/hudi/pull/7895#discussion_r1101343426



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java:
##########
@@ -799,27 +803,38 @@ public TaskContextSupplier getTaskContextSupplier() {
    * GenericRecords with writerSchema. Hence, we need to ensure that this 
conversion can take place without errors.
    */
   private void validateSchema() throws HoodieUpsertException, 
HoodieInsertException {
-
-    if (!shouldValidateAvroSchema() || 
getActiveTimeline().getCommitsTimeline().filterCompletedInstants().empty()) {
+    boolean allowProjection = config.shouldAllowAutoEvolutionColumnDrop();
+    boolean shouldValidate = shouldValidateAvroSchema();
+    if ((allowProjection && !shouldValidate)
+        || 
getActiveTimeline().getCommitsTimeline().filterCompletedInstants().empty()) {
       // Check not required
       return;
     }
 
     Schema tableSchema;
     Schema writerSchema;
-    boolean isValid;
+    String errorMessage = null;
     try {
       TableSchemaResolver schemaResolver = new 
TableSchemaResolver(getMetaClient());
       writerSchema = 
HoodieAvroUtils.createHoodieWriteSchema(config.getSchema());
-      tableSchema = 
HoodieAvroUtils.createHoodieWriteSchema(schemaResolver.getTableAvroSchemaWithoutMetadataFields());
-      isValid = isSchemaCompatible(tableSchema, writerSchema, 
config.shouldAllowAutoEvolutionColumnDrop());
+      tableSchema = 
HoodieAvroUtils.createHoodieWriteSchema(schemaResolver.getTableAvroSchema(false));
+      if (!allowProjection && !AvroSchemaUtils.canProject(tableSchema, 
writerSchema)) {
+        errorMessage = String.format("Column dropping is not allowed. Use %s 
to disable this check", SCHEMA_ALLOW_AUTO_EVOLUTION_COLUMN_DROP.key());
+      } else if (shouldValidate && !isSchemaCompatible(tableSchema, 
writerSchema)) {

Review Comment:
   I must admit that I underestimated this task:)
   1) Changed `HoodieTable.validate` leads to 9 failed tests in hudi-spark with 
   ```SchemaCompatibilityException: Column dropping is not allowed```. I'm not 
sure if changed validation is buggy or spark writer. I need more time to look 
around what's going on
   2)  There are conflicting options:
   `hoodie.datasource.write.drop.partition.columns=true` -- allows column 
dropping for partition columns. Current implementation(before this PR) skips 
column dropping check еven though 
`hoodie.datasource.write.schema.allow.auto.evolution.column.drop=false`
   It looks like we should introduce new method 
`canProjectExceptPartCols(precSchema, newSchema, partCols)`
   
   Mb you are right that `HoodieTableSink` is more appropriate place for these 
changes to avoid affecting spark. Or mb it is a workaround of 
`HoodieTable.validate` problem instead of its solution. I'll try to find it out



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] trushev commented on a diff in pull request #7895: [HUDI-5736] De-coupling column drop flag and schema validation flag in Flink

Reply via email to