istreeter commented on issue #9831: URL: https://github.com/apache/hudi/issues/9831#issuecomment-1754600158
Thank you, the FAQ is helpful for explaining ways to work with the problem. I think the summary of this conversation is that duplicates happen even if `hoodie.datasource.write.operation=upsert`. Therefore I strongly think [this bit of documentation](https://hudi.apache.org/docs/concurrency_control/#multi-writer-guarantees) needs changing. Currently it says: > UPSERT Guarantee: The target table will NEVER show duplicates. But that is misleading, because UPSERTS can include both INSERTS and UPDATES. So it should say either: > UPSERT Guarantee: The target table **MIGHT** show duplicates. or > **UPDATE** Guarantee: The target table will NEVER show duplicates. Sorry to keep pushing this point. But I think this is important, if the documentation is going to provide guarantees then the guarantees should be worded accurately. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
