Re: [I] Don't understand the result [hudi]

via GitHub Mon, 29 Dec 2025 23:50:37 -0800


bithw1 commented on issue #17734:
URL: https://github.com/apache/hudi/issues/17734#issuecomment-3698566830


   Thanks @deepakpanda93 
   
   From your experiment,  Spark DataFrame will not drop duplicates with the 
following configurations, while Spark SQL will?
   That's too confusing... There must be at least one behavior is wrong.
   
   ```
   set hoodie.spark.sql.insert.into.operation=insert;
   set hoodie.datasource.write.insert.drop.duplicates=false;
   set hoodie.datasource.write.insert.dup.policy=none;
   ```
   
   What do you mean by 
   
   ```
   With spark-sql hoodie.datasource.write.insert.dup.policy is used which is 
none by default so only no action is being taken in this case.
   ```
   
   You mean, if this configuration is set to none, that is, no action is being 
taken, duplicates will be dropped or will not be dropped


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Don't understand the result [hudi]

Reply via email to