jfz opened a new pull request, #5230:
URL: https://github.com/apache/iceberg/pull/5230

   This PR adds a transform API to table properties and the transforms are 
re-applied based on refreshed properties when there's a conflict, and this make 
it possible for truly stateful and transactional property updates.
   
   Currently, the table property update commit is basically "last commit wins", 
and the property value is updated out side of iceberg library so it will not be 
re-applied during commit retries when there's a conflict, that will cause some 
commits "effectively" being overwritten if the update is based on existing 
value.
   
   Consider below example use case: 
   Table property "subscribers" is used to track who should be notified 
whenever there's a new data write, and "subscribers" is a set of users and we 
do add/remove to manage users.
   
   Steps for a value update loss:
   1. initial there's only 1 user, subscribers = <u1>
   2. commit A attempts to add user u2 to subscribers: set value to  <u1,u2>
   3. commit B attempts to add user u3 to subscribers: set value to  <u1,u3>
   4. commit B succeeded first, now subscribers value is: <u1,u3>
   5. Commit A failed, and retry succeeded, setting new value to: <u1,u2>
   6. now both A and B succeeded, final value: <u1,u2>    =>   commit B is 
effectively lost.
   
   With transform API, commits can request a transform like "add user u2 to 
existing set" instead of "set it to <u1,u2>", and it will never be lost because 
transform is re-applied for commit retries.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to