Thank you Sagar! Here is the issue - https://github.com/apache/hudi/issues/9859
On Friday, October 13, 2023 at 01:52:24 AM EDT, sagar sumit <cod...@apache.org> wrote: Hi Himabindu, I am assuming your total data on storage is 700GB and not the incoming batch. INSERT_DROP_DUPS does work with large data. However, it is more time-consuming as it needs to tag the incoming records to dedupe. I would suggest creating a GitHub issue with Spark UI screenshots and datasource write configs.Also, it would be helpful if you could provide your use case for INSERT_DROP_DUPS.Maybe there is a better alternative. Regards,Sagar On Thu, Oct 12, 2023 at 3:42 AM Himabindu Kosuru <hkos...@yahoo.com.invalid> wrote: Hi All, We are using COW tables and INSERT_DROP_DUPS fails with HoodieUpsertException even on a 700 GB data. The data is partitioned and stored in GCS. Executors 150Exec memory 40gExec cores 8 Does INSERT_DROP_DUPS work with large data? Any recommendations to make it work such as spark config settings? Thanks,Bindu