[GitHub] [hudi] codope commented on a diff in pull request #7856: [HUDI-5704] De-coupling column drop flag and schema validation flag (0.13.0)

via GitHub Sun, 05 Feb 2023 07:36:11 -0800


codope commented on code in PR #7856:
URL: https://github.com/apache/hudi/pull/7856#discussion_r1096724818



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -455,7 +455,22 @@ object HoodieSparkSqlWriter {
           //       w/ the table's one and allow schemas to diverge. This is 
required in cases where
           //       partial updates will be performed (for ex, `MERGE INTO` 
Spark SQL statement) and as such
           //       only incoming dataset's projection has to match the table's 
schema, and not the whole one
-          if (!shouldValidateSchemasCompatibility || 
isSchemaCompatible(latestTableSchema, canonicalizedSourceSchema, 
allowAutoEvolutionColumnDrop)) {
+
+          if (!shouldValidateSchemasCompatibility) {
+            // if no validation is enabled, check for col drop
+            // if col drop is allowed, go ahead. if not, check for projection, 
so that we do not allow dropping cols
+            if (allowAutoEvolutionColumnDrop || canProject(latestTableSchema, 
canonicalizedSourceSchema)) {

Review Comment:
   `canProject` will return false if column names differ and ingestion will 
fail. However, we do allow merge into with different source column names or we 
even allow same column name but different case. 
   This is going to change the behavior compared to previous release. Is this 
something we should do? If yes, then what if users comes back asking for a 
workaround to unblock the pipeline. Then, weneed to tell them to set 
`hoodie.datasource.write.schema.allow.auto.evolution.column.drop=true` which 
doesn't sound intuitive. Wdyt?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] codope commented on a diff in pull request #7856: [HUDI-5704] De-coupling column drop flag and schema validation flag (0.13.0)

Reply via email to