Re: Re: [DISCUSS] KIP-1241: Reduce tiered storage redundancy with delayed upload

jian fu Wed, 28 Jan 2026 19:42:23 -0800

Hi  Kamal:

Thanks for your feedback.
I had already updated the KIP and relate code in draft PR. You can help to
review again. Thanks


Hi  Luke and all.

According to Kamal's suggestions. I do some small update for this KIP. Just
write a summary here to let you aware it.
(1) change the configure name to remote.copy.lazy.enable to reduce a little
confuse
(2) add broker level configure: log.remote.copy.lazy.enable for better
usage.
(3) note the analyst for the corner case for "local retention = remote
retention"

Thanks!

Regards
Jian


Kamal Chandraprakash <[email protected]> 于2026年1月28日周三 13:30写道：

> Hi Jian,
>
> Thanks for the response.
>
> 1. It is good to maintain the configs both at the topic and broker level.
>
> Topic config: `remote.copy.lazy.enable`
> Broker config: `log.remote.copy.lazy.enable`
>
> If a user wants to enable the lazy copy behaviour for all the topics
> (including the new ones), then they can set
> the broker level config: `log.remote.copy.lazy.enable` to true. Otherwise,
> it will be hard for the user to create
> the new topics with this config set when remote storage is enabled.
>
> > https://github.com/apache/kafka/pull/21361
>
> Left a few comments on the PR. Please take a look. Thanks for the PR!
>
> Thanks,
> Kamal
>
>
> On Mon, Jan 26, 2026 at 7:32 PM jian fu <[email protected]> wrote:
>
> > Hi Kamal:
> >
> > Thank you for your thorough review. Let me give feedback one by one:
> >
> >
> > Kamal 1 : Shall we rename the config `remote.log.latest.enable` to
> > `remote.copy.lazy.enable`?
> >
> > Jian: I think you may mean "remote.log.copy.lazy.enable".  Right? Some
> > other configures for the style's reference:
> > remote.log.copy.disable
> > remote.log.delete.on.disable
> >
> > I think your proposed name is a little better than
> > "remote.log.latest.enable":
> > remote.log.latest.enable focus on "result" and it is confused on "active
> > segment still not allow to be upload"
> > remote.log.copy.lazy.enable focus on "process" and it is confused on "all
> > segment are lazy update due to local segment need to be wait no-active"
> >
> > So I think I can adopt your propose but just want to wait your more
> > comments for this if we can keep current status or adopt
> > "remote.log.copy.lazy.enable"
> >
> >
> > Kamal  2. Do we want to have an equivalent broker config to enable the
> > feature for all the topics in the cluster?
> > Jian: I think we can keep current status especially if the name will be
> > changed to remote.log.copy.lazy.enable due to the another two configures
> > with same style are not broker level:
> > remote.log.copy.disable
> > remote.log.delete.on.disable
> > But it can be changed to broker level with few codes. So I am a little
> > confused if it is worthy to do it or just leave it there without any more
> > changes.
> > Wait for your and some guys' comments
> >
> > Kamal  3. the remote copy is configured to be lazy, What is the behaviour
> > when local and complete retention values are set to the same
> >
> > Jian: This is very interest corner case (I think maybe few person do this
> > thing, but it is interesting case). Let me try to deep dive for it:
> >
> > Actually, this is a valid case because local retention only not allow to
> >
> > remote retention in current code. If they are equal, it is better to skip
> > the update, as you mentioned, since the segment would be immediately
> > deleted after being uploaded to remote storage.
> >
> > However, if we do not upload it to remote storage, the local segment will
> > not be deleted because it waits for the highest offset in remote storage
> to
> > be updated after the upload.
> >
> > Moreover, if we skip the upload but directly update the highest offset in
> > remote storage, it becomes ambiguous whether the segment has already been
> > uploaded or not.
> >
> > Therefore, I came up with a solution. The demo PR is:
> > https://github.com/apache/kafka/pull/21361
> >
> > The idea is to skip the upload but still update the LogStartOffset.
> >
> > Considering this is a corner case, this solution also helps address
> another
> > issue: if the remote storage service is unavailable for a long time,
> local
> > segments may never get deleted forever even it over the retention time.
> >
> > I have added this case to the KIP and described this corner case there.
> > Thanks.
> >
> >
> > Hi @Luke Chen <[email protected]>
> > Sorry to  trouble you. Considering you already voted for this KIP. So I
> > ping you here. Can you also help to take a look the question1 and
> queston2
> > to give some more comments when you are free.
> >
> > Again. Thanks for all comments. They sparked further thoughts, and I look
> > forward to additional comments. Thanks a lot!
> >
> > Regards
> > Jian
> >
> >
> > Kamal Chandraprakash <[email protected]> 于2026年1月24日周六
> > 10:37写道：
> >
> > > Hi Jian,
> > >
> > > Thanks for the KIP! Few questions:
> > >
> > > 1. Shall we rename the config `remote.log.latest.enable` to
> > > `remote.copy.lazy.enable`?
> > >     The word latest somehow relates to the active segment and might
> > confuse
> > > the users.
> > >
> > > 2. Do we want to have an equivalent broker config to enable the feature
> > for
> > > all the topics in the cluster?
> > >     remote.copy.lazy.enable / log.remote.copy.lazy.enable
> > >     retention.ms / log.retention.ms
> > >     local.retention.ms / log.local.retention.ms
> > >
> > > 3. When the remote copy is configured to be lazy, What is the behaviour
> > > when local and complete retention values are set to the same?
> > >     Do we upload the data to remote, then immediately delete it from
> > > both remote and local? Or, do we skip uploading the segment to remote?
> > >
> > > Thanks,
> > > Kamal
> > >
> > > On Wed, Jan 7, 2026 at 6:18 PM jian fu <[email protected]> wrote:
> > >
> > > > Hi Luke:
> > > >
> > > > Thanks for your comments and I am sorry for the delayed response.
> > > > All of your understanding is right.
> > > >
> > > > "However in some cases, users are expecting consumer read with low
> > > > latency as much as possible" that is my case which I need to keep at
> > > least
> > > > one day's low latency to let business handle consume lag or some
> other
> > > > issues.
> > > >
> > > >
> > > > Regarding your comments:
> > > >
> > > >
> > > > 1. remote.log.keep.latest  or remote.log.latest.enable?
> > > >
> > > > remote.log.latest.enable,  let me correct the typo in the KIP content
> > > > though the code is right.
> > > >
> > > > 2. About the configuration doc: Determines whether to upload all
> > segments
> > > > to remote storage including the latest ones within local retention.
> > > > Why do we allow to upload the latest log segment to remote storage?
> The
> > > > latest log segment is the active log segment, right?
> > > >
> > > > Sorry,  it is my mistake.  The graph in the KIP is right. But the
> > > configure
> > > > doc is wrong, It should be
> > > > "Determines whether to upload all non-active segments to remote
> > storage,
> > > > including those still within local retention."
> > > >
> > > >
> > > > Done with the KIP content and related code change!
> > > >
> > > > Thanks a lot for your review and you can help to review again.
> > > >
> > > > Regards
> > > > Jian
> > > >
> > > >
> > > > Luke Chen <[email protected]> 于2026年1月6日周二 17:50写道：
> > > >
> > > > > Hi Jian,
> > > > >
> > > > > Thanks for the KIP!
> > > > >
> > > > > So, your goal is
> > > > > 1. allow consumers who are reading the hot data can still read from
> > the
> > > > > local storage.
> > > > > 2. try to avoid the duplicated data in local and remote as much as
> > > > > possible.
> > > > > Is my understanding correct?
> > > > >
> > > > > Currently, tiered storage keeps local for the local retention
> > time/size
> > > > > because we don't want the consumers who read the hot data with high
> > > > > latency. In this period of time, the duplication in local and
> remote
> > > > > storage is indeed a waste of cost. Although I also agree the cost
> > > should
> > > > > not be that huge because usually the local retention should not set
> > too
> > > > > high. However in some cases, users are expecting consumer read with
> > low
> > > > > latency as much as possible, the local retention is set to a high
> > value
> > > > and
> > > > > only expecting "very cold" data stored in the remote storage. In
> this
> > > > case,
> > > > > this KIP should be helpful.
> > > > >
> > > > >
> > > > > Comments:
> > > > > 1. remote.log.keep.latest  or remote.log.latest.enable?
> > > > > 2. About the configuration doc: Determines whether to upload all
> > > segments
> > > > > to remote storage including the latest ones within local retention.
> > > > > Why do we allow to upload the latest log segment to remote storage?
> > The
> > > > > latest log segment is the active log segment, right?
> > > > >
> > > > >
> > > > > Thank you,
> > > > > Luke
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Sun, Jan 4, 2026 at 10:19 PM jian fu <[email protected]>
> > wrote:
> > > > >
> > > > > > Hi All:
> > > > > >
> > > > > > Happy New Year! ! Bumping this thread again for more possible
> > > > discussion
> > > > > > before the vote starts.
> > > > > > Thanks a lot !
> > > > > >
> > > > > > Regards
> > > > > > Jian
> > > > > >
> > > > > > jian fu <[email protected]> 于2025年12月15日周一 20:00写道：
> > > > > >
> > > > > > > Hi All:
> > > > > > >
> > > > > > > Bumping this thread for more discussion. I’d really appreciate
> > more
> > > > > > > suggestions on this optional feature for tiered storage.
> Thanks a
> > > > lot !
> > > > > > >
> > > > > > > Regards
> > > > > > >
> > > > > > > Jian
> > > > > > >
> > > > > > > jian fu <[email protected]> 于2025年12月4日周四 21:54写道：
> > > > > > >
> > > > > > >> Hi All:
> > > > > > >>
> > > > > > >> I updated the KIP content according to Kamal and Haiying's
> > > > discussion:
> > > > > > >> 1  Explicitly emphasized that this is a topic-level optional
> > > feature
> > > > > > >> intended for users who prioritize cost.
> > > > > > >> 1  Added  the cost-saving calculation example
> > > > > > >> 2  Added  additional details about the operational drawback of
> > > this
> > > > > > >> feature: need extra disk expansion for the case: long time
> > remote
> > > > > > >> storage's outage.
> > > > > > >> 3  Added  the scenarios where it may not be very suitable/
> > > > beneficial
> > > > > to
> > > > > > >> enable the feature such as the topic's ratio for remote:local
> > > > > retention
> > > > > > is
> > > > > > >> a very big value.
> > > > > > >>
> > > > > > >> Thanks again for joining the discussion.
> > > > > > >>
> > > > > > >> Regards
> > > > > > >> Jian
> > > > > > >>
> > > > > > >> jian fu <[email protected]> 于2025年12月2日周二 20:27写道：
> > > > > > >>
> > > > > > >>> Hi Kamal:
> > > > > > >>>
> > > > > > >>> I think I understand what you mean now. I’ve updated the
> > picture
> > > in
> > > > > the
> > > > > > >>> link(
> > > > > >
> https://github.com/apache/kafka/pull/20913#issuecomment-3601274230
> > )
> > > > > > >>> .
> > > > > > >>> Could you help double-check whether we’ve reached the same
> > > > > > understanding?
> > > > > > >>> In short. the drawback of this KIP is that, during a long
> time
> > > > remote
> > > > > > >>> storage outage. it will occupied more disk. The max value is
> > the
> > > > > > redundant
> > > > > > >>> part we saving.
> > > > > > >>> Thus. After the outage recovered. It will come back to the
> > > > beginning.
> > > > > > >>> Pls help to correct me if my understanding is wrong!  Thanks
> > > again.
> > > > > > >>>
> > > > > > >>> Regards
> > > > > > >>> Jian
> > > > > > >>>
> > > > > > >>> Kamal Chandraprakash <[email protected]>
> > > > 于2025年12月2日周二
> > > > > > >>> 19:29写道：
> > > > > > >>>
> > > > > > >>>> The already uploaded segments are eligible for deletion from
> > the
> > > > > > broker.
> > > > > > >>>> So, when remote storage is down,
> > > > > > >>>> then those segments can be deleted as per the local
> retention
> > > > > settings
> > > > > > >>>> and
> > > > > > >>>> new segments can occupy those spaces.
> > > > > > >>>> This provides more time for the Admin to act when remote
> > storage
> > > > is
> > > > > > down
> > > > > > >>>> for a longer time.
> > > > > > >>>>
> > > > > > >>>> This is from a reliability perspective.
> > > > > > >>>>
> > > > > > >>>> On Tue, Dec 2, 2025 at 4:47 PM jian fu <
> [email protected]>
> > > > > wrote:
> > > > > > >>>>
> > > > > > >>>> > Hi Kamal and Haiying Cai:
> > > > > > >>>> >
> > > > > > >>>> > maybe you notice that my kafka clusters set 1day local + 3
> > > > days-7
> > > > > > days
> > > > > > >>>> > remote. thus  Haiying Cai‘s configure is 3 hours local + 3
> > > days
> > > > > > >>>> remote.
> > > > > > >>>> >
> > > > > > >>>> > I can explain more about my configure.
> > > > > > >>>> > I try to avoid the latency for some delay consumer to
> access
> > > the
> > > > > > >>>> remote.
> > > > > > >>>> > Maybe some applications may encounter some unexpected
> issue.
> > > but
> > > > > we
> > > > > > >>>> need to
> > > > > > >>>> > give enough time to handle it. In the period, we don't
> want
> > > the
> > > > > > >>>> consumer to
> > > > > > >>>> > access the remote to hurt the whole kafka clusters. So one
> > day
> > > > is
> > > > > > our
> > > > > > >>>> > expectation.
> > > > > > >>>> >
> > > > > > >>>> > I  saw one statement in  Haiying Cai  KIP1248:
> > > > > > >>>> > " Currently, when a new consumer or a fallen-off consumer
> > > > requires
> > > > > > >>>> fetching
> > > > > > >>>> > messages from a while ago, and those messages are no
> longer
> > > > > present
> > > > > > >>>> in the
> > > > > > >>>> > Kafka broker's local storage, the broker must download the
> > > > message
> > > > > > >>>> from the
> > > > > > >>>> > remote tiered storage and subsequently transfer the data
> > back
> > > to
> > > > > the
> > > > > > >>>> > consumer.   "
> > > > > > >>>> > Extend the local retention time is how we try to avoid the
> > > issue
> > > > > > >>>> (Here, we
> > > > > > >>>> > don't consider the case one new consumer use the earliest
> > > > strategy
> > > > > > to
> > > > > > >>>> > consume. it is not often happen in our cases.)
> > > > > > >>>> >
> > > > > > >>>> > So. based my configure. I will see there is one day's
> > > duplicated
> > > > > > >>>> segment
> > > > > > >>>> > wasting in remote storage. Thus I don't use them for real
> > time
> > > > > > >>>> analyst or
> > > > > > >>>> > care about the fast reboot or some thing else.  So propose
> > > this
> > > > > KIP
> > > > > > >>>> to take
> > > > > > >>>> > one topic level optional feature to help us to reduce
> waste
> > > and
> > > > > save
> > > > > > >>>> money.
> > > > > > >>>> >
> > > > > > >>>> > Regards
> > > > > > >>>> > Jian
> > > > > > >>>> >
> > > > > > >>>> > jian fu <[email protected]> 于2025年12月2日周二 18:42写道：
> > > > > > >>>> >
> > > > > > >>>> > > Hi  Kamal:
> > > > > > >>>> > >
> > > > > > >>>> > > Thanks for joining this discussion. Let me try to
> classify
> > > my
> > > > > > >>>> understands
> > > > > > >>>> > > for your good questions:
> > > > > > >>>> > >
> > > > > > >>>> > > 1  Kamal : Do you also have to update the RemoteCopy lag
> > > > > segments
> > > > > > >>>> and
> > > > > > >>>> > > bytes metric?
> > > > > > >>>> > >     Jian:  The code just delay the upload time for local
> > > > > segment.
> > > > > > >>>> So it
> > > > > > >>>> > > seems there is no need to change any lag segments or
> > > metrics.
> > > > > > right?
> > > > > > >>>> > >
> > > > > > >>>> > > 2   Kamal :  As Haiying mentioned, the segments get
> > > eventually
> > > > > > >>>> uploaded
> > > > > > >>>> > to
> > > > > > >>>> > > remote so not sure about the
> > > > > > >>>> > > benefit of this proposal. And, remote storage cost is
> > > > considered
> > > > > > as
> > > > > > >>>> low
> > > > > > >>>> > > when compared to broker local-disk.
> > > > > > >>>> > >      Jian: The cost benefit is about the total size for
> > > > > occupied.
> > > > > > >>>> Take
> > > > > > >>>> > AWS
> > > > > > >>>> > > S3 as example. Tiered price for: 1 GB is 0.02 USD (You
> can
> > > > refer
> > > > > > to
> > > > > > >>>> > > https://calculator.aws/#/createCalculator/S3).
> > > > > > >>>> > >   It is cheaper than local disk. So as I mentioned that
> > the
> > > > > saving
> > > > > > >>>> money
> > > > > > >>>> > > depend on the ratio local vs remote retention time.  If
> > your
> > > > set
> > > > > > the
> > > > > > >>>> > remote
> > > > > > >>>> > > storage time as a long time. The benefit is few, It is
> > just
> > > > > > >>>> avoiding the
> > > > > > >>>> > > waste instead of cost saving.
> > > > > > >>>> > >   So I take it as topic level optional configure instead
> > of
> > > > > > default
> > > > > > >>>> > > feature.
> > > > > > >>>> > >
> > > > > > >>>> > > 3  Kamal:   It provides some cushion during third-party
> > > object
> > > > > > >>>> storage
> > > > > > >>>> > > downtime.
> > > > > > >>>> > >      Jian:   I draw one picture to try to under the
> logic(
> > > > > > >>>> > >
> > > > > >
> https://github.com/apache/kafka/pull/20913#issuecomment-3601274230
> > ).
> > > > > > >>>> You
> > > > > > >>>> > > can help to check if my understanding is right. I seemed
> > > that
> > > > no
> > > > > > >>>> > difference
> > > > > > >>>> > > for them. So for this question. maybe we need to discuss
> > > more
> > > > > > about
> > > > > > >>>> it.
> > > > > > >>>> > The
> > > > > > >>>> > > only difference maybe we may increase a little local
> disk
> > > for
> > > > > temp
> > > > > > >>>> due to
> > > > > > >>>> > > the delay for upload remote. So in the original
> proposal.
> > I
> > > > want
> > > > > > to
> > > > > > >>>> > upload
> > > > > > >>>> > > N-1 segments. But it seems the value is not much.
> > > > > > >>>> > >
> > > > > > >>>> > > BTW. I want to classify one basic rule: this feature
> isn't
> > > to
> > > > > > >>>> change the
> > > > > > >>>> > > default behavior. and the saving amount is not very big
> > > value
> > > > in
> > > > > > all
> > > > > > >>>> > cases.
> > > > > > >>>> > > It is suitable for part of topic which set a low ratio
> for
> > > > > > >>>> remote/local
> > > > > > >>>> > > such as 7days/1days or 3days/1day
> > > > > > >>>> > > At the last. Thanks again for your time and your
> comments.
> > > All
> > > > > the
> > > > > > >>>> > > questions are valid and good for us to thing more about
> > it.
> > > > > > >>>> > >
> > > > > > >>>> > > Regards
> > > > > > >>>> > > Jian
> > > > > > >>>> > >
> > > > > > >>>> > >
> > > > > > >>>> > > Kamal Chandraprakash <[email protected]>
> > > > > > 于2025年12月2日周二
> > > > > > >>>> > > 17:41写道：
> > > > > > >>>> > >
> > > > > > >>>> > >> 1. Do you also have to update the RemoteCopy lag
> segments
> > > and
> > > > > > bytes
> > > > > > >>>> > >> metric?
> > > > > > >>>> > >> 2. As Haiying mentioned, the segments get eventually
> > > uploaded
> > > > > to
> > > > > > >>>> remote
> > > > > > >>>> > so
> > > > > > >>>> > >> not sure about the
> > > > > > >>>> > >> benefit of this proposal. And, remote storage cost is
> > > > > considered
> > > > > > >>>> as low
> > > > > > >>>> > >> when compared to broker local-disk.
> > > > > > >>>> > >> It provides some cushion during third-party object
> > storage
> > > > > > >>>> downtime.
> > > > > > >>>> > >>
> > > > > > >>>> > >>
> > > > > > >>>> > >> On Tue, Dec 2, 2025 at 2:45 PM Kamal Chandraprakash <
> > > > > > >>>> > >> [email protected]> wrote:
> > > > > > >>>> > >>
> > > > > > >>>> > >> > Hi Jian,
> > > > > > >>>> > >> >
> > > > > > >>>> > >> > Thanks for the KIP!
> > > > > > >>>> > >> >
> > > > > > >>>> > >> > When remote storage is unavailable for a few hrs,
> then
> > > with
> > > > > > lazy
> > > > > > >>>> > upload
> > > > > > >>>> > >> > there is a risk of the broker disk getting full soon.
> > > > > > >>>> > >> > The Admin has to configure the local retention
> configs
> > > > > > >>>> properly.  With
> > > > > > >>>> > >> > eager upload, the disk utilization won't grow
> > > > > > >>>> > >> > until the local retention time (expectation is that
> all
> > > the
> > > > > > >>>> > >> > passive segments are uploaded). And, provides some
> time
> > > > > > >>>> > >> > for the Admin to take any action based on the
> > situation.
> > > > > > >>>> > >> >
> > > > > > >>>> > >> > --
> > > > > > >>>> > >> > Kamal
> > > > > > >>>> > >> >
> > > > > > >>>> > >> > On Tue, Dec 2, 2025 at 10:28 AM Haiying Cai via dev <
> > > > > > >>>> > >> [email protected]>
> > > > > > >>>> > >> > wrote:
> > > > > > >>>> > >> >
> > > > > > >>>> > >> >> Jian,
> > > > > > >>>> > >> >>
> > > > > > >>>> > >> >> Understands this is an optional feature and the cost
> > > > saving
> > > > > > >>>> depends
> > > > > > >>>> > on
> > > > > > >>>> > >> >> the ratio between local.retention.ms and total
> > > > retention.ms
> > > > > .
> > > > > > >>>> > >> >>
> > > > > > >>>> > >> >> In our setup, we have local.retention set to 3 hours
> > and
> > > > > total
> > > > > > >>>> > >> retention
> > > > > > >>>> > >> >> set to 3 days, so the saving is not going to be
> > > > significant.
> > > > > > >>>> > >> >>
> > > > > > >>>> > >> >> On 2025/12/01 05:33:11 jian fu wrote:
> > > > > > >>>> > >> >> > Hi Haiying Cai,
> > > > > > >>>> > >> >> >
> > > > > > >>>> > >> >> > Thanks for joining the discussion for this KIP.
> All
> > of
> > > > > your
> > > > > > >>>> > concerns
> > > > > > >>>> > >> are
> > > > > > >>>> > >> >> > valid, and that is exactly why I introduced a
> > > > topic-level
> > > > > > >>>> > >> configuration
> > > > > > >>>> > >> >> to
> > > > > > >>>> > >> >> > make this feature optional. This means that, by
> > > default,
> > > > > the
> > > > > > >>>> > behavior
> > > > > > >>>> > >> >> > remains unchanged. Only when users are not
> pursuing
> > > > faster
> > > > > > >>>> broker
> > > > > > >>>> > >> boot
> > > > > > >>>> > >> >> time
> > > > > > >>>> > >> >> > or other optimizations — and care more about cost
> —
> > > > would
> > > > > > they
> > > > > > >>>> > enable
> > > > > > >>>> > >> >> this
> > > > > > >>>> > >> >> > option to some topics to save resources.
> > > > > > >>>> > >> >> >
> > > > > > >>>> > >> >> > Regarding cost self: the actual savings depend on
> > the
> > > > > ratio
> > > > > > >>>> between
> > > > > > >>>> > >> >> local
> > > > > > >>>> > >> >> > retention and remote retention. In the KIP/PR, I
> > > > provided
> > > > > a
> > > > > > >>>> test
> > > > > > >>>> > >> >> example:
> > > > > > >>>> > >> >> > if we configure 1 day of local retention and 2
> days
> > of
> > > > > > remote
> > > > > > >>>> > >> >> retention, we
> > > > > > >>>> > >> >> > can save about 50%. And realistically, I don't
> think
> > > > > anyone
> > > > > > >>>> would
> > > > > > >>>> > >> boldly
> > > > > > >>>> > >> >> > set local retention to a very small value (such as
> > > > > minutes)
> > > > > > >>>> due to
> > > > > > >>>> > >> the
> > > > > > >>>> > >> >> > latency concerns associated with remote storage.
> So
> > in
> > > > > > short,
> > > > > > >>>> the
> > > > > > >>>> > >> >> feature
> > > > > > >>>> > >> >> > will help reduce cost, and the amount saved simply
> > > > depends
> > > > > > on
> > > > > > >>>> the
> > > > > > >>>> > >> ratio.
> > > > > > >>>> > >> >> > Take my company's usage as real example, we
> > configure
> > > > most
> > > > > > of
> > > > > > >>>> the
> > > > > > >>>> > >> >> topics: 1
> > > > > > >>>> > >> >> > day of local retention and 3–7 days of remote
> > storage
> > > (3
> > > > > > days
> > > > > > >>>> for
> > > > > > >>>> > >> topic
> > > > > > >>>> > >> >> > with log/metric usage, 7 days for topic with
> normal
> > > > > business
> > > > > > >>>> > usage).
> > > > > > >>>> > >> >> and we
> > > > > > >>>> > >> >> > don't care about the boot speed and some thing
> else,
> > > > This
> > > > > > KIP
> > > > > > >>>> > allows
> > > > > > >>>> > >> us
> > > > > > >>>> > >> >> to
> > > > > > >>>> > >> >> > save 1/7 to 1/3 of the total disk usage for remote
> > > > > storage.
> > > > > > >>>> > >> >> >
> > > > > > >>>> > >> >> > Anyway, this is just a topic-level optional
> feature
> > > > which
> > > > > > >>>> don't
> > > > > > >>>> > >> reject
> > > > > > >>>> > >> >> the
> > > > > > >>>> > >> >> > benifit for current design. Thanks again for the
> > > > > discussion.
> > > > > > >>>> I can
> > > > > > >>>> > >> >> update
> > > > > > >>>> > >> >> > the KIP to better classify scenarios where this
> > > optional
> > > > > > >>>> feature is
> > > > > > >>>> > >> not
> > > > > > >>>> > >> >> > suitable. Currently, I only listed real-time
> > analytics
> > > > as
> > > > > > the
> > > > > > >>>> > >> negative
> > > > > > >>>> > >> >> > example.
> > > > > > >>>> > >> >> >
> > > > > > >>>> > >> >> > Welcome further discussion to help make this KIP
> > more
> > > > > > >>>> complete.
> > > > > > >>>> > >> Thanks!
> > > > > > >>>> > >> >> >
> > > > > > >>>> > >> >> > Regards,
> > > > > > >>>> > >> >> > Jian
> > > > > > >>>> > >> >> >
> > > > > > >>>> > >> >> > Haiying Cai via dev <[email protected]>
> > > > > 于2025年12月1日周一
> > > > > > >>>> > 12:40写道：
> > > > > > >>>> > >> >> >
> > > > > > >>>> > >> >> > > Jian,
> > > > > > >>>> > >> >> > >
> > > > > > >>>> > >> >> > > Thanks for the contribution.  But I feel the
> > > uploading
> > > > > the
> > > > > > >>>> local
> > > > > > >>>> > >> >> segment
> > > > > > >>>> > >> >> > > file to remote storage ASAP is advantageous in
> > > several
> > > > > > >>>> scenarios:
> > > > > > >>>> > >> >> > >
> > > > > > >>>> > >> >> > > 1. Enable the fast bootstrapping a new broker.
> A
> > > new
> > > > > > broker
> > > > > > >>>> > >> doesn’t
> > > > > > >>>> > >> >> have
> > > > > > >>>> > >> >> > > to replicate all the data from the leader
> broker,
> > it
> > > > > only
> > > > > > >>>> needs
> > > > > > >>>> > to
> > > > > > >>>> > >> >> > > replicate the data from the tail of the remote
> log
> > > > > segment
> > > > > > >>>> to the
> > > > > > >>>> > >> >> tail of
> > > > > > >>>> > >> >> > > the current end of the topic (LSO) since all the
> > > other
> > > > > > data
> > > > > > >>>> are
> > > > > > >>>> > in
> > > > > > >>>> > >> the
> > > > > > >>>> > >> >> > > remote tiered storage and it can download them
> > later
> > > > > > >>>> lazily, this
> > > > > > >>>> > >> is
> > > > > > >>>> > >> >> what
> > > > > > >>>> > >> >> > > KIP-1023 trying to solve;
> > > > > > >>>> > >> >> > > 2. Although nobody has proposed a KIP to allow a
> > > > > consumer
> > > > > > >>>> client
> > > > > > >>>> > to
> > > > > > >>>> > >> >> read
> > > > > > >>>> > >> >> > > from the remote tiered storage directly, but
> this
> > > will
> > > > > > >>>> helps the
> > > > > > >>>> > >> >> > > fall-behind consumer to do catch-up reads or
> > perform
> > > > the
> > > > > > >>>> > backfill.
> > > > > > >>>> > >> >> This
> > > > > > >>>> > >> >> > > path allows the consumer backfill to finish
> > without
> > > > > > >>>> polluting the
> > > > > > >>>> > >> >> broker’s
> > > > > > >>>> > >> >> > > page cache.  The earlier the data is on the
> remote
> > > > > tiered
> > > > > > >>>> > storage,
> > > > > > >>>> > >> >> the more
> > > > > > >>>> > >> >> > > advantageous it is for the client.
> > > > > > >>>> > >> >> > >
> > > > > > >>>> > >> >> > > I think in your Proposal, you are delaying
> > uploading
> > > > the
> > > > > > >>>> segment
> > > > > > >>>> > >> but
> > > > > > >>>> > >> >> the
> > > > > > >>>> > >> >> > > file will still be uploaded at a later time, I
> > guess
> > > > > this
> > > > > > >>>> can
> > > > > > >>>> > >> saves a
> > > > > > >>>> > >> >> few
> > > > > > >>>> > >> >> > > hours storage cost for that file in the remote
> > > > storage,
> > > > > > not
> > > > > > >>>> sure
> > > > > > >>>> > >> >> whether
> > > > > > >>>> > >> >> > > that is a significant cost saved (if the file
> > needs
> > > to
> > > > > > stay
> > > > > > >>>> in
> > > > > > >>>> > >> remote
> > > > > > >>>> > >> >> > > tiered storage for several days or weeks due to
> > > > > retention
> > > > > > >>>> > policy).
> > > > > > >>>> > >> >> > >
> > > > > > >>>> > >> >> > > On 2025/11/19 13:29:11 jian fu wrote:
> > > > > > >>>> > >> >> > > > Hi everyone, I'd like to start a discussion on
> > > > > KIP-1241,
> > > > > > >>>> the
> > > > > > >>>> > goal
> > > > > > >>>> > >> >> is to
> > > > > > >>>> > >> >> > > > reduce the remote storage. KIP:
> > > > > > >>>> > >> >> > > >
> > > > > > >>>> > >> >> > >
> > > > > > >>>> > >> >>
> > > > > > >>>> > >>
> > > > > > >>>> >
> > > > > > >>>>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1241%3A+Reduce+tiered+storage+redundancy+with+delayed+upload
> > > > > > >>>> > >> >> > > >
> > > > > > >>>> > >> >> > > > The Draft PR:
> > > > > > >>>> https://github.com/apache/kafka/pull/20913
> > > > > > >>>> > >> >> Problem:
> > > > > > >>>> > >> >> > > > Currently,
> > > > > > >>>> > >> >> > > > Kafka's tiered storage implementation uploads
> > all
> > > > > > >>>> non-active
> > > > > > >>>> > >> local
> > > > > > >>>> > >> >> log
> > > > > > >>>> > >> >> > > > segments to remote storage immediately, even
> > when
> > > > they
> > > > > > are
> > > > > > >>>> > still
> > > > > > >>>> > >> >> within
> > > > > > >>>> > >> >> > > the
> > > > > > >>>> > >> >> > > > local retention period.
> > > > > > >>>> > >> >> > > > This results in redundant storage of the same
> > data
> > > > in
> > > > > > both
> > > > > > >>>> > local
> > > > > > >>>> > >> and
> > > > > > >>>> > >> >> > > remote
> > > > > > >>>> > >> >> > > > tiers.
> > > > > > >>>> > >> >> > > >
> > > > > > >>>> > >> >> > > > When there is no requirement for real-time
> > > analytics
> > > > > or
> > > > > > >>>> > immediate
> > > > > > >>>> > >> >> > > > consumption based on remote storage. It has
> the
> > > > > > following
> > > > > > >>>> > >> drawbacks:
> > > > > > >>>> > >> >> > > >
> > > > > > >>>> > >> >> > > > 1. Wastes storage capacity and costs: The same
> > > data
> > > > is
> > > > > > >>>> stored
> > > > > > >>>> > >> twice
> > > > > > >>>> > >> >> > > during
> > > > > > >>>> > >> >> > > > the local retention window
> > > > > > >>>> > >> >> > > > 2. Provides no immediate benefit: During the
> > local
> > > > > > >>>> retention
> > > > > > >>>> > >> period,
> > > > > > >>>> > >> >> > > reads
> > > > > > >>>> > >> >> > > > prioritize local data, making the remote copy
> > > > > > unnecessary
> > > > > > >>>> > >> >> > > >
> > > > > > >>>> > >> >> > > >
> > > > > > >>>> > >> >> > > > So. this KIP is to reduce tiered storage
> > > redundancy
> > > > > with
> > > > > > >>>> > delayed
> > > > > > >>>> > >> >> upload.
> > > > > > >>>> > >> >> > > > You can check the test result example here
> > > directly:
> > > > > > >>>> > >> >> > > >
> > > > > > >>>> > >>
> > > > > >
> https://github.com/apache/kafka/pull/20913#issuecomment-3547156286
> > > > > > >>>> > >> >> > > > Looking forward to your feedback! Best
> regards,
> > > Jian
> > > > > > >>>> > >> >> > > >
> > > > > > >>>> > >> >> >
> > > > > > >>>> > >> >
> > > > > > >>>> > >> >
> > > > > > >>>> > >>
> > > > > > >>>> > >
> > > > > > >>>> > >
> > > > > > >>>> > >
> > > > > > >>>> > >
> > > > > > >>>> >
> > > > > > >>>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >
> > > > > > > --
> > > > > > > Regards
> > > > > > >
> > > > > > > Fu.Jian
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Re: [DISCUSS] KIP-1241: Reduce tiered storage redundancy with delayed upload

Reply via email to