grantatspothero commented on PR #16207: URL: https://github.com/apache/iceberg/pull/16207#issuecomment-4373457460
Our problem was excessive memory usage due to caching TableMetadata on the client side. Storing a `List<HistoryEntry>` in memory is fine for small numbers of snapshots, but each entry takes ~32 bytes and this grows quickly when you have a single coordinator service caching iceberg metadata in memory. Example: - 1000 table metadata cached in memory - each table commits every 30s, with 30 days of snapshot retention = 2*60*24*30 ~100K snapshots in iceberg metadata - 32 bytes * 100K = 3.2 MB per table - 3.2MB/table * 100 tables = 32GB Note: this is "resident set size" not "total allocations" which tends to be significantly higher due to intermediate allocations of parsing JSON. For multi-tenant coordinator services (eg: commit services, cache services) this memory usage is a problem. The biggest memory hog is by far the snapshots array, but snapshotLog is the next biggest. Since we already defer snapshots, it seemed reasonable to defer snapshotLog. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
