Reviving this thread.

The discussion focused mostly on optimizing
the write path of metadata.json, but we’ve been seeing significant memory
pressure on the read path as well.

In Trino, most queries are reads and many
TableMetadata instances can be cached in coordinator memory. With large
numbers
of snapshots (e.g. streaming workloads and 30 day retention), both
`snapshots` and `snapshotLog` scale linearly and become large contributors
to heap usage.

Iceberg already supports lazy loading for `snapshots`, so I explored
applying a similar approach to `snapshotLog`. Conceptually, these two
fields have similar scaling characteristics, so it seemed reasonable
to treat them consistently.

I put together a prototype here:
https://github.com/apache/iceberg/pull/16207

Curious if others have seen similar memory pressure issues, especially
in singleton coordinators where metadata is cached across
many tables.

Grant

Reply via email to