kazdy commented on issue #7266: URL: https://github.com/apache/hudi/issues/7266#issuecomment-1322719011
I think the issue is here: https://github.com/apache/hudi/blob/b662cf6789cbefb4af3c10dc544c1040ff57cbb3/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala#L140-L161 In this case, `isNonStrictMode` is `false` and `hasPrecombineColumn` is `false` so it matches `INSERT_OPERATION_OPT_VAL ` and Hudi uses `OverwriteWithLatestAvroPayload` and does not validate for duplicate keys even though `strict` insert mode is being used. https://github.com/apache/hudi/blob/b662cf6789cbefb4af3c10dc544c1040ff57cbb3/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala#L163-L170 This behavior is inconsistent with the documentation which tells nothing about preCombineField: > For strict mode, insert statement will keep the primary key uniqueness constraint which do not allow duplicate record. https://hudi.apache.org/docs/configurations#hoodiesqlinsertmode cc @jonvex since we talked about this on slack -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
