Hello Andrew,

I think you'll find the discussion in this GitHub issue [1] very relevant
to your problem.
In fact, this problem has come up a few times in the Iceberg community and
I think now would be a good time to revisit this discussion.

> It may be useful in similar situations to update the table only
if certain metadata fields have not changed, without tying these fields to
specific snapshots.

I raised a PR [2] previously to implement precisely this.
Unfortunately, progress on the PR stalled after a few rounds of reviews due
to limited reviewer bandwidth at the time.
I would be happy to revive the PR if the community agrees that this is
still the right way to solve this issue.

Best wishes,
Farooq

[1] https://github.com/apache/iceberg/issues/6514
[2] https://github.com/apache/iceberg/pull/6513

On Fri, Jul 25, 2025 at 12:06 PM Andrew Wong <aw...@redpanda.com> wrote:

> BLUF:
> - We are using snapshot references to preserve custom table-level metadata
> that
>   currently exists in snapshot summaries. Is this an anti-pattern or
> expected
>   usage?
> - If it is an anti-pattern, is there something else in the spec we can use
> for
>   this purpose? If not, would it make sense to introduce table-level
> metadata
>   in the spec?
>
> Details below:
>
> Hello Iceberg community,
>
> We (Redpanda[1][2]) have built a log storage engine that, in addition to
> writing log format data, writes data as Parquet files and commits them to
> the
> Iceberg catalog. One of the requirements we have is to ensure exactly once
> delivery of records into Iceberg. To this end, we keep metadata in two
> places:
> - In the Iceberg table, we add the position in our log up to which has been
>   committed as a field in each new Iceberg snapshot’s summary.
> - In our system, we checkpoint this same position up to which we have
> committed
>   to Iceberg.
>
> It’s possible for these to diverge (e.g. in the event of a node failure in
> between the above two events), but in such cases, the Iceberg table is
> taken as
> the source of truth. As I understand it, this is the same technique the
> Kafka
> Connect connector uses.
>
> But there is a problem with this approach when considering snapshot expiry
> alongside concurrent updates from multiple systems. While the default
> snapshot
> expiration is 5 days, it’s conceivable a user sets the table’s snapshot
> expiry
> to something significantly lower to avoid metadata bloat. To boot, we
> cannot
> assume that our system is the only system writing to Iceberg, and the main
> snapshot is the only snapshot guaranteed to be retained at all times. It’s
> thus
> conceivable that external systems add snapshots to the table, and for
> snapshot
> expiry to remove the snapshot metadata we require. If these conditions are
> met
> in a moment of divergence, there is room for exactly once delivery to be
> violated and for files to be committed to the table more than once.
>
> To mitigate this, we maintain an Iceberg tag for the latest snapshot
> written by
> our system, and rely on the snapshot reference expiry policy[3] to ensure
> that
> these tagged snapshots aren’t removed, with the assumption that it is more
> likely to tune down the `max-snapshot-age-ms` property (to keep manifest
> list
> size small) than it is to tune down the `max-ref-age-ms` property.
>
> There are still at least a couple issues with this approach:
> - A user can still set `max-ref-age-ms` to something pathologically small
> and
>   end up causing an exactly-once violation.
> - It feels like we’re overloading the intended behavior of tags by using
> them
>   to force explicit snapshot retention.
>
> Our question is, is there anything better that we can be doing here? Are
> there
> other parts of the spec that can serve our needs? Table properties field
> seems
> somewhat what we want, but:
> - It is explicitly described as being not meant for arbitrary metadata[4].
> - For it to be useful for our use case, we'd need some kind of table
>   requirement that checks these properties atomically (today, we use
>   snapshot-based table requirements when we commit).
>
> So if not something existing, do folks have thoughts on generalized ways to
> store custom metadata in the table? As an example, is there any appetite in
> adding a different table-level metadata field to the spec? As Iceberg
> becomes
> adopted by more systems, it's not hard to imagine similar requirements
> popping
> up elsewhere. It may be useful in similar situations to update the table
> only
> if certain metadata fields have not changed, without tying these fields to
> specific snapshots.
>
>
> Thanks,
> Andrew
>
> [1] https://www.redpanda.com/
> [2] https://github.com/redpanda-data/redpanda
> [3] https://iceberg.apache.org/spec/#snapshot-retention-policy
> [4] https://iceberg.apache.org/spec/#table-metadata-fields
>

Reply via email to