kazdy commented on issue #7266:
URL: https://github.com/apache/hudi/issues/7266#issuecomment-1322719011

   I think the issue is here:
   
https://github.com/apache/hudi/blob/b662cf6789cbefb4af3c10dc544c1040ff57cbb3/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala#L140-L161
   
   In this case, `isNonStrictMode` is `false` and `hasPrecombineColumn` is 
`false` 
   so it matches `INSERT_OPERATION_OPT_VAL `
   and Hudi uses `OverwriteWithLatestAvroPayload` and does not validate for 
duplicate keys even though `strict` insert mode is being used.
   
https://github.com/apache/hudi/blob/b662cf6789cbefb4af3c10dc544c1040ff57cbb3/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala#L163-L170
   
   This behavior is inconsistent with the documentation which tells nothing 
about preCombineField:
   
   > For strict mode, insert statement will keep the primary key uniqueness 
constraint which do not allow duplicate record.
   
   https://hudi.apache.org/docs/configurations#hoodiesqlinsertmode
   
   cc @jonvex since we talked about this on slack
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to