parisni commented on issue #6531: URL: https://github.com/apache/hudi/issues/6531#issuecomment-1230429062
`hoodie.merge.allow.duplicate.on.inserts=true` fixes the problem. BTW, I suggest to update the documentation: INSERT This operation is very similar to upsert in terms of heuristics/file sizing but completely skips the index lookup step. Thus, it can be a lot faster than upserts for use-cases like log de-duplication (in conjunction with options to filter duplicates mentioned below). **Still duplicates are usually merged by default with hoodie.merge.allow.duplicate.on.inserts=false** This is also suitable for use-cases where the table can tolerate duplicates, but just need the transactional writes/incremental pull/storage management capabilities of Hudi. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
