Hey folks,

I wanted to propose this feature to Apache Polaris Rolling back
replacements operation snapshots in the case during the concurrent write
(compaction and other writers trying to commit to the table at the same
time) to Iceberg there are conflicts. This is a feature which Ryan proposed
as an alternative when I was proposing a Priority Amongst Writer proposal
[1]  in the Apache Iceberg community. This kind of makes the compaction
always a low priority process.

Earlier, I went ahead and added this feature as a client side change in the
Apache Iceberg repo [2] . It got some attraction but this didn't get to the
end. Now when we think more about it again Apache Polaris seems to be the
best place to do it as it can benefit other language writer clients as well
and Polaris is the one to actually apply the commits based on the
requirements and update sent by Iceberg Rest Client.

Here is my draft PR [3] on how I think this can be achieved, given this is
enabled by a table property, happy to discuss other knobs for ex: maybe
check the snapshot prop ?

The logic essentially if we see is the base (B) on which the snapshot we
want to include/commit is based on is changed to something like (B`) and
the given snapshot from B` to B are all of ops type *REPLACE *. It adds
other updates within the same update Table req
1. moved the snapshot ref to B
2. [Optional] to remove the snapshot between B` to B given its all of
*REPLACE*.
Then try the requirements and updates again on the updated base and see if
it succeeds. To make all this as part of one updateReq and then commit to
the table.
Doing it this way preserves the schema changes for which no new snapshot
has been created, just a new metadata.json is created.

Happy to know your thoughts on the same.

Links:
[1]
https://docs.google.com/document/d/1pSqxf5A59J062j9VFF5rcCpbW9vdTbBKTmjps80D-B0/edit?tab=t.0#heading=h.fn6jmpw6phpn
[2] https://github.com/apache/iceberg/pull/5888
[3] https://github.com/apache/polaris/pull/1285

Best,
Prashant Singh

Reply via email to