mao-liu opened a new pull request, #8186: URL: https://github.com/apache/paimon/pull/8186
### Purpose In Paimon v1.3 (prior to https://github.com/apache/paimon/commit/960dce1a18cccb3beac1d5ae0c8a1f59414498ae), manifest cache cold-filling incurred significant heap memory spike during cold-filling. This problem was raised and discussed in https://github.com/apache/paimon/issues/7030 and https://github.com/apache/paimon/pull/7031. This problem is particularly evident for highly partitioned tables in jobs with high parallelism. While the heap spike issue is mostly resolved via https://github.com/apache/paimon/commit/960dce1a18cccb3beac1d5ae0c8a1f59414498ae, some additional manifest cache options are proposed here to help tune the manifest cache for highly partitioned tables in jobs with high parallelism. When many high-parallelism writers restore at the same time, the Job Manager's manifest cache can become a memory bottleneck. The cache holds entries with soft references, so under sustained heap pressure the JVM reclaims entries that are then immediately re-read and decompressed, driving heap back up and triggering further reclamation — a cache-thrash spiral. There was previously no way to tune this behavior. This PR exposes additional manifest-cache controls and a prefetch option to make this tunable: - Added `WriteRestoreScanBenchmark`, a micro-benchmark that reproduces the manifest-cache cold-fill memory spike and reports heap/cache footprint across cache-disabled vs. cache-enabled (strong-ref) arms. On Paimon v1.3, this benchmark would reveal significant memory heap spike during cold-filling on the cache-enabled path. This problem is no longer present after https://github.com/apache/paimon/commit/960dce1a18cccb3beac1d5ae0c8a1f59414498ae, however the benchmark could still be useful in measuring performance and detecting regression in the future. - `SegmentsCache` now supports a configurable idle TTL (`expire-after-access`) and a `soft-values` toggle. Setting `soft-values=false` pins the working set with strong references so the thrash spiral cannot start; the cache then stays bounded by weight (up to its configured memory). The defaults preserve the existing behavior (soft references on). - New catalog option: - `cache.manifest.soft-values` (default `true`) — toggle soft/strong references for the catalog manifest cache. The catalog manifest cache continues to inherit the catalog-wide `cache.expire-after-access` TTL. - New writer-coordinator options: - `sink.writer-coordinator.cache-soft-values` (default `true`) — same soft/strong reference toggle for the coordinator manifest cache. - `sink.writer-coordinator.cache-expire-after-access` (default disabled) — optional idle TTL for coordinator cache entries; the cache stays bounded by `sink.writer-coordinator.cache-memory` regardless. - `sink.writer-coordinator.prefetch-manifests` (default `false`) — eagerly read all data manifests of the latest snapshot during refresh to warm the in-Job-Manager manifest cache once, avoiding many concurrent cold manifest reads when writers restore simultaneously. - Docs: documented the new options and added a "Write Initialize" section in `write-performance.md` explaining when these settings help, the failure mechanism, and how they resolve it. ### Tests - `SegmentsCacheTest`: covers defaults (soft refs on, no TTL), getter pass-through, `create` returning null on zero memory, and that strong references stay bounded by weight-based eviction. - `CachingCatalogTest#testManifestCacheOptions`: asserts the catalog manifest cache picks up `soft-values` and inherits the catalog idle TTL. - `TableWriteCoordinatorTest`: `testBuildManifestCacheOptions` verifies the coordinator options map to the cache (default soft refs + no TTL, explicit TTL honored, `soft-values=false` switches to strong refs, zero memory disables the cache); `testPrefetchManifestsWarmsCache` verifies that constructing the coordinator with prefetch enabled warms the cache and that scan results remain correct. - Regenerated config docs verified by `ConfigOptionsDocsCompletenessITCase`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
