nsivabalan commented on issue #3418:
URL: https://github.com/apache/hudi/issues/3418#issuecomment-917431256


   If you wish to dedup with bulk_insert, we also need to set 
"hoodie.combine.before.insert" to true. 
   Just to clarify, bulk_insert will not looking into any records in storage at 
all. so setting this config, will ensure incoming batch is deduped and written 
to hudi. 
   In other words, if you do 2 bulk_inserts, one followed by another, each 
batch will write unique records to hudi, but if there are records overlapping 
between batch 1 and batch2, bulk_insert may not update it. 
   
   hope that clarifies. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to