Hi everyone, I'd like to start a discussion on KIP-1241, the goal is to reduce the remote storage. KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-1241%3A+Reduce+tiered+storage+redundancy+with+delayed+upload
The Draft PR: https://github.com/apache/kafka/pull/20913 Problem: Currently, Kafka's tiered storage implementation uploads all non-active local log segments to remote storage immediately, even when they are still within the local retention period. This results in redundant storage of the same data in both local and remote tiers. When there is no requirement for real-time analytics or immediate consumption based on remote storage. It has the following drawbacks: 1. Wastes storage capacity and costs: The same data is stored twice during the local retention window 2. Provides no immediate benefit: During the local retention period, reads prioritize local data, making the remote copy unnecessary So. this KIP is to reduce tiered storage redundancy with delayed upload. You can check the test result example here directly: https://github.com/apache/kafka/pull/20913#issuecomment-3547156286 Looking forward to your feedback! Best regards, Jian
