On Wed, Feb 11, 2026 at 6:50 PM Tobias <[email protected]> wrote:

> Hi Krutika and Amogh,
>
> I'm not sure if handling this on the implementation level of a REST
> catalog is the best solution. The remove-snapshot update implementation of
> iceberg-go, iceberg-rust and iceberg-java all silently remove references to
> snapshots. A catalog does not know if the received update was produced by
> running ExpireSnapshots or if it is an explicit call, changing the behavior
> to fail instead of cleaning up referenced branches means the REST catalog
> diverges in behavior from other catalogs.
>
>
Hi Tobias,

That is a valid point -- that a catalog cannot tell if a remove-snapshot
update came from a client or a snapshot expiration maintenance job. If it
is the former, we do want to remove both the snapshot and any refs it is
associated with, but the solution I was proposing would silently retain the
snapshot a client intended to delete.


> Assuming that ExpireSnapshots always produces RemoveSnapshotRef alongside
> RemoveSnapshots updates, a boolean field could be added to RemoveSnapshots
> which would fail instead of doing the silent cleanup of snapshot references.
>
>
This solution seems to address both the concerns, while still preserving
the existing public signature for RemoveSnapshots() in iceberg-go. So
existing callers will be unaffected.

Thanks,
Krutika


> Kind regards,
> Tobias
>
> Am Mi., 11. Feb. 2026 um 11:54 Uhr schrieb Krutika Dhananjay <
> [email protected]>:
>
>> Hi Amogh,
>>
>> Thanks for the response. You're right that ExpireSnapshots correctly
>> identifies reachable snapshots at computation time. Java and Go
>> implementations both do this. The race window I'm concerned about is a
>> change of state between when ExpireSnapshots computes the candidate list
>> and when Commit executes.
>>
>> Specifically:
>> 1. ExpireSnapshots loads metadata M1, computes candidates [snap-1]
>> (unreferenced in M1)
>> 2. Client concurrently commits SetSnapshotRef("feature-branch", snap-1)
>> (metadata is now M2)
>> 3. Maintenance job's Commit loads fresh metadata M2 and applies
>> RemoveSnapshots([snap-1]) against it
>> 4. CAS succeeds because maintenance has the latest pointer
>>
>> The result is that both snap-1 and the newly added feature-branch ref are
>> removed, since RemoveSnapshots also cleans up refs pointing to deleted
>> snapshots.
>>
>> Having said that, I do think this is probably best handled at the catalog
>> level rather than changing client library semantics or the spec. We'll
>> handle it in our catalog's commit path by re-validating the candidate list
>> against the freshly loaded metadata before applying the updates.
>>
>> Thanks,
>> Krutika
>>
>> On Tue, Feb 10, 2026 at 12:03 AM Amogh Jahagirdar <[email protected]>
>> wrote:
>>
>>> Hey Krutika,
>>>
>>> >1. A maintenance job identifies snap-1 as eligible for expiration.
>>> > 2. A client commits SetSnapshotRef("feature-branch", snap-1).
>>> > If the client commits first, the maintenance job loads the fresh
>>> metadata (which now contains a ref for feature-branch -> snap-1), builds
>>> its RemoveSnapshots([snap-1]) update, and commits. The CAS > succeeds
>>> because the maintenance job has the latest metadata pointer. But the
>>> resulting metadata is corrupt: feature-branch points to a deleted snapshot.
>>>
>>> This situation shouldn't happen. In the scenario described, after the
>>> client commits, it has committed an addition of a reference to a snapshot.
>>> The maintenance job performing snapshot removal, should not be removing
>>> "snap-1" because it's still referenced (assuming retention is set correctly
>>> etc). The Java reference implementation of expire snapshots
>>> implementation does a reachability of what
>>> <https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L299>
>>> snapshots are still referenced, and those will not be eligible for cleanup.
>>> Here's the overall algorithm
>>> <https://iceberg.apache.org/spec/#snapshot-retention-policy>.
>>>
>>> Let me know if I misunderstood the scenario or even better if there's a
>>> test that can repro the issue, but I don't think we should need another
>>> update type, the cleanup logic in snapshot expiration already should not be
>>> considering valid referenced snapshots for removal.
>>>
>>> Thanks,
>>> Amogh Jahagirdar
>>>
>>> On Sun, Feb 8, 2026 at 10:14 PM Krutika Dhananjay <[email protected]>
>>> wrote:
>>>
>>>> Adding Matt Topol to the discussion.
>>>>
>>>> Thanks,
>>>> Krutika
>>>>
>>>> On Fri, Feb 6, 2026 at 3:55 PM Krutika Dhananjay <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi everyone,
>>>>>
>>>>> I'd like to bring up a race condition between snapshot expiration and
>>>>> concurrent client commits that add refs to snapshots.
>>>>>
>>>>> *The Problem*
>>>>> Consider a table with snap-1 (unreferenced) and snap-2 (main). Two
>>>>> operations happen concurrently:
>>>>>
>>>>> 1. A maintenance job identifies snap-1 as eligible for expiration.
>>>>> 2. A client commits SetSnapshotRef("feature-branch", snap-1).
>>>>>
>>>>> If the client commits first, the maintenance job loads the fresh
>>>>> metadata (which now contains a ref for feature-branch -> snap-1), builds
>>>>> its RemoveSnapshots([snap-1]) update, and commits. The CAS succeeds 
>>>>> because
>>>>> the maintenance job has the latest metadata pointer. But the resulting
>>>>> metadata is corrupt: feature-branch points to a deleted snapshot.
>>>>>
>>>>> CAS-based optimistic concurrency protects against concurrent pointer
>>>>> updates, but it does not validate the semantic correctness of the 
>>>>> operation
>>>>> against the current state. The maintenance job's commit is structurally
>>>>> valid (latest pointer) but semantically invalid (removes a referenced
>>>>> snapshot).
>>>>>
>>>>> *Proposed Fix in iceberg-go*
>>>>> We addressed <https://github.com/apache/iceberg-go/pull/716> this in
>>>>> iceberg-go by modifying MetadataBuilder.RemoveSnapshots() to skip removing
>>>>> any snapshot that is currently referenced by a branch or tag. This works
>>>>> correctly -- the validation happens against the same metadata state that
>>>>> the CAS will verify hasn't changed.
>>>>> However, this changes the existing behavior of RemoveSnapshots, which
>>>>> currently removes the snapshot unconditionally and cleans up any refs that
>>>>> pointed to it. This would be a deviation from the behaviour in 
>>>>> iceberg-java
>>>>> and iceberg-rs.
>>>>>
>>>>> *Proposed Alternative: AssertSnapshotNotReferenced*
>>>>> A less invasive approach that preserves the existing RemoveSnapshots
>>>>> behavior would be to introduce a new assertion update --
>>>>> AssertSnapshotNotReferenced. This would be analogous to existing
>>>>> assertions, but for validating that a snapshot has no refs pointing to it.
>>>>>
>>>>> During commit, a catalog-side maintenance job would include this
>>>>> assertion alongside its RemoveSnapshots update:
>>>>>
>>>>> updates: [
>>>>>       AssertSnapshotNotReferenced(snap-1),
>>>>>       RemoveSnapshots([snap-1])
>>>>> ]
>>>>>
>>>>> The assertion would be validated against the freshly loaded metadata
>>>>> during commit. If a concurrent client had linked a ref to snap-1 in the
>>>>> meantime, the assertion fails, the CAS-based commit is never attempted, 
>>>>> and
>>>>> the maintenance job can retry with updated knowledge of the table state.
>>>>>
>>>>> This approach:
>>>>> - Does not change existing RemoveSnapshots semantics
>>>>> - Can be implemented uniformly across Java, Rust, and Go
>>>>>
>>>>> Would love to hear thoughts on the preferred way to handle this.
>>>>>
>>>>> Thanks,
>>>>> Krutika
>>>>>
>>>>

Reply via email to