codope commented on code in PR #7856:
URL: https://github.com/apache/hudi/pull/7856#discussion_r1096724818
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -455,7 +455,22 @@ object HoodieSparkSqlWriter {
// w/ the table's one and allow schemas to diverge. This is
required in cases where
// partial updates will be performed (for ex, `MERGE INTO`
Spark SQL statement) and as such
// only incoming dataset's projection has to match the table's
schema, and not the whole one
- if (!shouldValidateSchemasCompatibility ||
isSchemaCompatible(latestTableSchema, canonicalizedSourceSchema,
allowAutoEvolutionColumnDrop)) {
+
+ if (!shouldValidateSchemasCompatibility) {
+ // if no validation is enabled, check for col drop
+ // if col drop is allowed, go ahead. if not, check for projection,
so that we do not allow dropping cols
+ if (allowAutoEvolutionColumnDrop || canProject(latestTableSchema,
canonicalizedSourceSchema)) {
Review Comment:
`canProject` will return false if column names differ and ingestion will
fail. However, we do allow merge into with different source column names or we
even allow same column name but different case.
This is going to change the behavior compared to previous release. Is this
something we should do? If yes, then what if users comes back asking for a
workaround to unblock the pipeline. Then, weneed to tell them to set
`hoodie.datasource.write.schema.allow.auto.evolution.column.drop=true` which
doesn't sound intuitive. Wdyt?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]