bithw1 commented on issue #17734: URL: https://github.com/apache/hudi/issues/17734#issuecomment-3698566830
Thanks @deepakpanda93 From your experiment, Spark DataFrame will not drop duplicates with the following configurations, while Spark SQL will? That's too confusing... There must be at least one behavior is wrong. ``` set hoodie.spark.sql.insert.into.operation=insert; set hoodie.datasource.write.insert.drop.duplicates=false; set hoodie.datasource.write.insert.dup.policy=none; ``` What do you mean by ``` With spark-sql hoodie.datasource.write.insert.dup.policy is used which is none by default so only no action is being taken in this case. ``` You mean, if this configuration is set to none, that is, no action is being taken, duplicates will be dropped or will not be dropped -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
