+1 selective snapshot compaction would be a good addition for streaming/low latency commit workloads. A tradeoff is that it requires users to opt-in to more iceberg maintenance, which isn’t always feasible as you mentioned above.
I think both options would work in tandem: Short term: optimize read-path (eg: lazy load snapshotLog) Longer term: explore options such as selective snapshot compaction, storing snapshots in separate storage from metadata.json, improvements to REST catalog On Wed, May 6, 2026 at 1:05 AM Péter Váry <[email protected]> wrote: > Another question we should consider: > - Do we really need to keep all these snapshots? > > Let's consider a table with the following history: S1, S2, S3, S4. If we > don't have equality deletes, could we create an S2' with only metadata > changes which would contain everything form S2 and S3? If we rewrite the > table history to S1, S2', S4, then we can reduce the number of snapshots we > need to keep. > > Selective snapshot compaction is something which could be useful for many > cases. > > On Tue, May 5, 2026, 17:38 Amogh Jahagirdar <[email protected]> wrote: > >> Thanks Grant, >> >> The use case where there are commits every 30 seconds and simultaneously >> there's also a 30 day retention does seem unique to me but >> overall I do support simple implementation changes to be able to improve >> that situation, so I will take a deeper look at the PR. >> >> In particular, I'd need to check time-travel queries (and rollbacks) in >> this model since those cases anyways would need to load the snapshot log. >> Rollbacks should be less frequent but if time travel queries are >> also common in this situation, the history will need to be loaded anyways, >> limiting the benefit of optimizing the history load. >> >> I think there's also a tradeoff here worth considering: for tables which >> are moving fast enough, the utility of caching table metadata is reduced. >> So for higher-frequency write tables where reads are not as frequent, it >> may be worth considering simply not caching the table metadata and hitting >> the catalog directly rather than optimizing the memory footprint of >> metadata that has a lower cache hit rate. >> >> Also I don't think there's anything V4 specific about this, rather it's >> just calling out a potential implementation improvement independent of >> table format or catalog spec. >> >> Thanks, >> Amogh Jahagirdar >> >> On Tue, May 5, 2026 at 7:57 AM Grant Nicholas < >> [email protected]> wrote: >> >>> Reviving this thread. >>> >>> The discussion focused mostly on optimizing >>> the write path of metadata.json, but we’ve been seeing significant memory >>> pressure on the read path as well. >>> >>> In Trino, most queries are reads and many >>> TableMetadata instances can be cached in coordinator memory. With large >>> numbers >>> of snapshots (e.g. streaming workloads and 30 day retention), both >>> `snapshots` and `snapshotLog` scale linearly and become >>> large contributors to heap usage. >>> >>> Iceberg already supports lazy loading for `snapshots`, so I explored >>> applying a similar approach to `snapshotLog`. Conceptually, these two >>> fields have similar scaling characteristics, so it seemed reasonable >>> to treat them consistently. >>> >>> I put together a prototype here: >>> https://github.com/apache/iceberg/pull/16207 >>> >>> Curious if others have seen similar memory pressure issues, especially >>> in singleton coordinators where metadata is cached across >>> many tables. >>> >>> Grant >>> >>
