Hello Andrew, I think you'll find the discussion in this GitHub issue [1] very relevant to your problem. In fact, this problem has come up a few times in the Iceberg community and I think now would be a good time to revisit this discussion.
> It may be useful in similar situations to update the table only if certain metadata fields have not changed, without tying these fields to specific snapshots. I raised a PR [2] previously to implement precisely this. Unfortunately, progress on the PR stalled after a few rounds of reviews due to limited reviewer bandwidth at the time. I would be happy to revive the PR if the community agrees that this is still the right way to solve this issue. Best wishes, Farooq [1] https://github.com/apache/iceberg/issues/6514 [2] https://github.com/apache/iceberg/pull/6513 On Fri, Jul 25, 2025 at 12:06 PM Andrew Wong <aw...@redpanda.com> wrote: > BLUF: > - We are using snapshot references to preserve custom table-level metadata > that > currently exists in snapshot summaries. Is this an anti-pattern or > expected > usage? > - If it is an anti-pattern, is there something else in the spec we can use > for > this purpose? If not, would it make sense to introduce table-level > metadata > in the spec? > > Details below: > > Hello Iceberg community, > > We (Redpanda[1][2]) have built a log storage engine that, in addition to > writing log format data, writes data as Parquet files and commits them to > the > Iceberg catalog. One of the requirements we have is to ensure exactly once > delivery of records into Iceberg. To this end, we keep metadata in two > places: > - In the Iceberg table, we add the position in our log up to which has been > committed as a field in each new Iceberg snapshot’s summary. > - In our system, we checkpoint this same position up to which we have > committed > to Iceberg. > > It’s possible for these to diverge (e.g. in the event of a node failure in > between the above two events), but in such cases, the Iceberg table is > taken as > the source of truth. As I understand it, this is the same technique the > Kafka > Connect connector uses. > > But there is a problem with this approach when considering snapshot expiry > alongside concurrent updates from multiple systems. While the default > snapshot > expiration is 5 days, it’s conceivable a user sets the table’s snapshot > expiry > to something significantly lower to avoid metadata bloat. To boot, we > cannot > assume that our system is the only system writing to Iceberg, and the main > snapshot is the only snapshot guaranteed to be retained at all times. It’s > thus > conceivable that external systems add snapshots to the table, and for > snapshot > expiry to remove the snapshot metadata we require. If these conditions are > met > in a moment of divergence, there is room for exactly once delivery to be > violated and for files to be committed to the table more than once. > > To mitigate this, we maintain an Iceberg tag for the latest snapshot > written by > our system, and rely on the snapshot reference expiry policy[3] to ensure > that > these tagged snapshots aren’t removed, with the assumption that it is more > likely to tune down the `max-snapshot-age-ms` property (to keep manifest > list > size small) than it is to tune down the `max-ref-age-ms` property. > > There are still at least a couple issues with this approach: > - A user can still set `max-ref-age-ms` to something pathologically small > and > end up causing an exactly-once violation. > - It feels like we’re overloading the intended behavior of tags by using > them > to force explicit snapshot retention. > > Our question is, is there anything better that we can be doing here? Are > there > other parts of the spec that can serve our needs? Table properties field > seems > somewhat what we want, but: > - It is explicitly described as being not meant for arbitrary metadata[4]. > - For it to be useful for our use case, we'd need some kind of table > requirement that checks these properties atomically (today, we use > snapshot-based table requirements when we commit). > > So if not something existing, do folks have thoughts on generalized ways to > store custom metadata in the table? As an example, is there any appetite in > adding a different table-level metadata field to the spec? As Iceberg > becomes > adopted by more systems, it's not hard to imagine similar requirements > popping > up elsewhere. It may be useful in similar situations to update the table > only > if certain metadata fields have not changed, without tying these fields to > specific snapshots. > > > Thanks, > Andrew > > [1] https://www.redpanda.com/ > [2] https://github.com/redpanda-data/redpanda > [3] https://iceberg.apache.org/spec/#snapshot-retention-policy > [4] https://iceberg.apache.org/spec/#table-metadata-fields >