RE: [DISCUSS] KIP-1241: Reduce tiered storage redundancy with delayed upload

Haiying Cai via dev Sun, 30 Nov 2025 20:40:18 -0800

Jian,

Thanks for the contribution.  But I feel the uploading the local segment file 
to remote storage ASAP is advantageous in several scenarios:

1. Enable the fast bootstrapping a new broker.  A new broker doesn’t have to 
replicate all the data from the leader broker, it only needs to replicate the 
data from the tail of the remote log segment to the tail of the current end of 
the topic (LSO) since all the other data are in the remote tiered storage and 
it can download them later lazily, this is what KIP-1023 trying to solve;
2. Although nobody has proposed a KIP to allow a consumer client to read from 
the remote tiered storage directly, but this will helps the fall-behind 
consumer to do catch-up reads or perform the backfill.  This path allows the 
consumer backfill to finish without polluting the broker’s page cache.  The 
earlier the data is on the remote tiered storage, the more advantageous it is 
for the client.

I think in your Proposal, you are delaying uploading the segment but the file 
will still be uploaded at a later time, I guess this can saves a few hours 
storage cost for that file in the remote storage, not sure whether that is a 
significant cost saved (if the file needs to stay in remote tiered storage for 
several days or weeks due to retention policy).

On 2025/11/19 13:29:11 jian fu wrote:
> Hi everyone, I'd like to start a discussion on KIP-1241, the goal is to
> reduce the remote storage. KIP:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1241%3A+Reduce+tiered+storage+redundancy+with+delayed+upload
> 
> The Draft PR:   https://github.com/apache/kafka/pull/20913    Problem:
> Currently,
> Kafka's tiered storage implementation uploads all non-active local log
> segments to remote storage immediately, even when they are still within the
> local retention period.
> This results in redundant storage of the same data in both local and remote
> tiers.
> 
> When there is no requirement for real-time analytics or immediate
> consumption based on remote storage. It has the following drawbacks:
> 
> 1. Wastes storage capacity and costs: The same data is stored twice during
> the local retention window
> 2. Provides no immediate benefit: During the local retention period, reads
> prioritize local data, making the remote copy unnecessary
> 
> 
> So. this KIP is to reduce tiered storage redundancy with delayed upload.
> You can check the test result example here directly:
> https://github.com/apache/kafka/pull/20913#issuecomment-3547156286
> Looking forward to your feedback! Best regards, Jian
>

RE: [DISCUSS] KIP-1241: Reduce tiered storage redundancy with delayed upload

Reply via email to