Hi everyone, I'd like to start a discussion on KIP-1241, the goal is to
reduce the remote storage. KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1241%3A+Reduce+tiered+storage+redundancy+with+delayed+upload

The Draft PR:   https://github.com/apache/kafka/pull/20913    Problem:
Currently,
Kafka's tiered storage implementation uploads all non-active local log
segments to remote storage immediately, even when they are still within the
local retention period.
This results in redundant storage of the same data in both local and remote
tiers.

When there is no requirement for real-time analytics or immediate
consumption based on remote storage. It has the following drawbacks:

1. Wastes storage capacity and costs: The same data is stored twice during
the local retention window
2. Provides no immediate benefit: During the local retention period, reads
prioritize local data, making the remote copy unnecessary


So. this KIP is to reduce tiered storage redundancy with delayed upload.
You can check the test result example here directly:
https://github.com/apache/kafka/pull/20913#issuecomment-3547156286
Looking forward to your feedback! Best regards, Jian

Reply via email to