TerryTaoYY opened a new pull request, #21306: URL: https://github.com/apache/kafka/pull/21306
JIRA: KAFKA-12641 ## Summary This PR bounds leader-epoch cache growth in `RemoteLogMetadataCache` and clarifies/strengthens the semantics of the per-epoch “highest successfully copied offset” as a monotonic progress watermark. ## Motivation - `RemoteLogMetadataCache` maintains per-leader-epoch state. If epoch entries are never removed once all segments for an epoch have been deleted, the cache can grow without bound over time (KAFKA-12641). - The management subsystem consumes `highestOffsetForEpoch` as a progress signal. This value should be a monotonic watermark of what has been successfully copied, and it should not regress or disappear due to retention-driven deletes. ## Changes ### 1) Bound leader-epoch entry growth (KAFKA-12641) - Remove leader-epoch entries once all segments for that epoch have been deleted to avoid unbounded growth. ### 2) Make `highestOffsetForEpoch` a pure lookup of a monotonic watermark - Maintain a per-epoch “highest successfully copied offset” watermark updated on `COPY_SEGMENT_FINISHED` and merged via `max`. - `highestOffsetForEpoch(int leaderEpoch)` is now a pure lookup (no fallback / hidden side-effects). - Add a TODO noting that any future snapshot/rehydration path that bypasses `COPY_SEGMENT_FINISHED` would need to rebuild this map. ### 3) Documentation clarifications - Clarify in Javadocs that the monotonic watermark may be higher than the end offsets of segments currently present in remote storage after retention-driven deletes. ### 4) Teardown/lifecycle behavior - Do not explicitly clear internal maps on partition removal/close; instead rely on removing references to the cache to avoid surprising concurrency/lifecycle side effects. ## Testing Executed locally (both succeeded): ```text - ./gradlew :storage:test --tests org.apache.kafka.server.log.remote.metadata.storage.RemoteLogMetadataCacheTest - ./gradlew :storage:test ``` ## Files touched - storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/RemoteLogMetadataCache.java - storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/RemotePartitionMetadataStore.java - storage/api/src/main/java/org/apache/kafka/server/log/remote/storage/RemoteLogMetadataManager.java -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
