Reviving this thread. The discussion focused mostly on optimizing the write path of metadata.json, but we’ve been seeing significant memory pressure on the read path as well.
In Trino, most queries are reads and many TableMetadata instances can be cached in coordinator memory. With large numbers of snapshots (e.g. streaming workloads and 30 day retention), both `snapshots` and `snapshotLog` scale linearly and become large contributors to heap usage. Iceberg already supports lazy loading for `snapshots`, so I explored applying a similar approach to `snapshotLog`. Conceptually, these two fields have similar scaling characteristics, so it seemed reasonable to treat them consistently. I put together a prototype here: https://github.com/apache/iceberg/pull/16207 Curious if others have seen similar memory pressure issues, especially in singleton coordinators where metadata is cached across many tables. Grant
