nsivabalan commented on issue #3418: URL: https://github.com/apache/hudi/issues/3418#issuecomment-917431256
If you wish to dedup with bulk_insert, we also need to set "hoodie.combine.before.insert" to true. Just to clarify, bulk_insert will not looking into any records in storage at all. so setting this config, will ensure incoming batch is deduped and written to hudi. In other words, if you do 2 bulk_inserts, one followed by another, each batch will write unique records to hudi, but if there are records overlapping between batch 1 and batch2, bulk_insert may not update it. hope that clarifies. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
