jfz opened a new pull request, #5230: URL: https://github.com/apache/iceberg/pull/5230
This PR adds a transform API to table properties and the transforms are re-applied based on refreshed properties when there's a conflict, and this make it possible for truly stateful and transactional property updates. Currently, the table property update commit is basically "last commit wins", and the property value is updated out side of iceberg library so it will not be re-applied during commit retries when there's a conflict, that will cause some commits "effectively" being overwritten if the update is based on existing value. Consider below example use case: Table property "subscribers" is used to track who should be notified whenever there's a new data write, and "subscribers" is a set of users and we do add/remove to manage users. Steps for a value update loss: 1. initial there's only 1 user, subscribers = <u1> 2. commit A attempts to add user u2 to subscribers: set value to <u1,u2> 3. commit B attempts to add user u3 to subscribers: set value to <u1,u3> 4. commit B succeeded first, now subscribers value is: <u1,u3> 5. Commit A failed, and retry succeeded, setting new value to: <u1,u2> 6. now both A and B succeeded, final value: <u1,u2> => commit B is effectively lost. With transform API, commits can request a transform like "add user u2 to existing set" instead of "set it to <u1,u2>", and it will never be lost because transform is re-applied for commit retries. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
