bvaradar commented on issue #1585: URL: https://github.com/apache/incubator-hudi/issues/1585#issuecomment-623965327
Try setting hoodie.cleaner.commits.retained=1 to keep the number of versions at minimum. Hudi has an option to filter out duplicate rows. For DeltaStreamer, use the flag "--filter-dupes --op INSERT". For Spark DataSource based writes, set the option hoodie.datasource.write.insert.drop.duplicates=true and hoodie.datasource.write.operation=insert ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
