istreeter commented on issue #9831: URL: https://github.com/apache/hudi/issues/9831#issuecomment-1752826301
Thank you @xicm for sharing the link to RFC-66. This was enlightening for how the problem will be addressed in future. But for now keeping the conversation about the current release (0.14.0), when I read RFC-66 it made me realise there are two variations on the scenario I described above: - If the table has `hoodie.index.type=BUCKET` and `hoodie.index.bucket.engine=CONSISTENT_HASHING` then Writer A and Writer B will both attempt to write the record to the same file group. So Writer B will detect a conflict and it will fail the commit. So there will not be a duplicate record in the table. - If the table has any other type of index, then Writer A and Writer B will write the record to different file groups. Both writes will succeed, and there will be no conflict. But unfortunately the table has a duplicate record. My question remains... Is all this correct even for UPSERTS? i.e. with `hoodie.datasource.write.operation=upsert`? I am pushing the point because the documentation says: > UPSERT Guarantee: The target table will NEVER show duplicates. But from how I understand it, that is only true if using a consistent hashing index. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org