am-cpp commented on issue #2992:
URL: https://github.com/apache/hudi/issues/2992#issuecomment-847745284


   The issue seems to be happening only when the **INSERT_DROP_DUPS_OPT_KEY** 
flag is set to **true**.  Looks like this config is being used for both:
   
   1. Pre-combining: 
[link](https://github.com/apache/hudi/blob/master/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L182)
   2. Deleting records already present in the 
table:[link](https://github.com/apache/hudi/blob/master/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L158)
   
   As far as the behavior of the insert overwrite API is concerned it should 
always delete the partition and copy the incoming records. Drop duplicates 
should just pre-combine the input records.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to