istreeter commented on issue #9831:
URL: https://github.com/apache/hudi/issues/9831#issuecomment-1752826301

   Thank you @xicm for sharing the link to RFC-66.  This was enlightening for 
how the problem will be addressed in future.
   
   But for now keeping the conversation about the current release (0.14.0), 
when I read RFC-66 it made me realise there are two variations on the scenario 
I described above:
   
   - If the table has `hoodie.index.type=BUCKET` and 
`hoodie.index.bucket.engine=CONSISTENT_HASHING` then Writer A and Writer B will 
both attempt to write the record to the same file group.  So Writer B will 
detect a conflict and it will fail the commit.  So there will not be a 
duplicate record in the table.
   - If the table has any other type of index, then Writer A and Writer B will 
write the record to different file groups.  Both writes will succeed, and there 
will be no conflict.  But unfortunately the table has a duplicate record.
   
   
   My question remains... Is all this correct even for UPSERTS?  i.e. with 
`hoodie.datasource.write.operation=upsert`?
   
   I am pushing the point because the documentation says:
   
   > UPSERT Guarantee: The target table will NEVER show duplicates.
   
   But from how I understand it, that is only true if using a consistent 
hashing index.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to