rahulsmahadev opened a new pull request, #16762: URL: https://github.com/apache/iceberg/pull/16762
## Summary When `io.manifest.cache.enabled` is set, manifest files are served through `ContentCache`, but manifest **list** files are not — `BaseSnapshot.cacheManifests` reads the list with a raw `FileIO.newInputFile` call. Every freshly loaded `Snapshot` (table refresh, new table handle, streaming poll) therefore re-fetches the same immutable manifest-list file from object storage, even when the content cache is enabled and warm. This change routes the manifest-list read through the same content cache used for manifests: - `ManifestFiles.newInputFile(FileIO, ManifestListFile)` — package-private twin of the existing `newInputFile(FileIO, ManifestFile)` helper: wraps the input with `ContentCache.tryCache` when caching is enabled for the `FileIO`. - `BaseSnapshot.cacheManifests` uses it for the manifest-list read. Manifest lists are immutable and location-unique like manifests, so the existing cache keying and invalidation semantics apply unchanged. Encrypted manifest lists follow the same contract as manifests today: the `FileIO` (e.g. `EncryptingFileIO`) controls decryption, and caching behavior is identical to the manifest path. ## Test plan - New `TestManifestCaching.testManifestListCaching`: a freshly parsed snapshot loads the manifest list through the cache (miss + cache-size increase), and a second snapshot instance reading the same list is served from the cache (no new miss, hit count increases). - Updated `testPlanWithCache` expectations: with the change, each append commit also caches one manifest list (parent lists are read while committing, and the current snapshot's list while planning), so the cache holds `numFiles * 2` entries. - Verified locally: `TestManifestCaching` (6/6), `TestManifestListVersions`, `TestSnapshot`, `TestCommitReporting`, `TestManifestListEncryption`, plus spotless and checkstyle. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
