Hi Jian, Thanks for the response.
1. It is good to maintain the configs both at the topic and broker level. Topic config: `remote.copy.lazy.enable` Broker config: `log.remote.copy.lazy.enable` If a user wants to enable the lazy copy behaviour for all the topics (including the new ones), then they can set the broker level config: `log.remote.copy.lazy.enable` to true. Otherwise, it will be hard for the user to create the new topics with this config set when remote storage is enabled. > https://github.com/apache/kafka/pull/21361 Left a few comments on the PR. Please take a look. Thanks for the PR! Thanks, Kamal On Mon, Jan 26, 2026 at 7:32 PM jian fu <[email protected]> wrote: > Hi Kamal: > > Thank you for your thorough review. Let me give feedback one by one: > > > Kamal 1 : Shall we rename the config `remote.log.latest.enable` to > `remote.copy.lazy.enable`? > > Jian: I think you may mean "remote.log.copy.lazy.enable". Right? Some > other configures for the style's reference: > remote.log.copy.disable > remote.log.delete.on.disable > > I think your proposed name is a little better than > "remote.log.latest.enable": > remote.log.latest.enable focus on "result" and it is confused on "active > segment still not allow to be upload" > remote.log.copy.lazy.enable focus on "process" and it is confused on "all > segment are lazy update due to local segment need to be wait no-active" > > So I think I can adopt your propose but just want to wait your more > comments for this if we can keep current status or adopt > "remote.log.copy.lazy.enable" > > > Kamal 2. Do we want to have an equivalent broker config to enable the > feature for all the topics in the cluster? > Jian: I think we can keep current status especially if the name will be > changed to remote.log.copy.lazy.enable due to the another two configures > with same style are not broker level: > remote.log.copy.disable > remote.log.delete.on.disable > But it can be changed to broker level with few codes. So I am a little > confused if it is worthy to do it or just leave it there without any more > changes. > Wait for your and some guys' comments > > Kamal 3. the remote copy is configured to be lazy, What is the behaviour > when local and complete retention values are set to the same > > Jian: This is very interest corner case (I think maybe few person do this > thing, but it is interesting case). Let me try to deep dive for it: > > Actually, this is a valid case because local retention only not allow to > > remote retention in current code. If they are equal, it is better to skip > the update, as you mentioned, since the segment would be immediately > deleted after being uploaded to remote storage. > > However, if we do not upload it to remote storage, the local segment will > not be deleted because it waits for the highest offset in remote storage to > be updated after the upload. > > Moreover, if we skip the upload but directly update the highest offset in > remote storage, it becomes ambiguous whether the segment has already been > uploaded or not. > > Therefore, I came up with a solution. The demo PR is: > https://github.com/apache/kafka/pull/21361 > > The idea is to skip the upload but still update the LogStartOffset. > > Considering this is a corner case, this solution also helps address another > issue: if the remote storage service is unavailable for a long time, local > segments may never get deleted forever even it over the retention time. > > I have added this case to the KIP and described this corner case there. > Thanks. > > > Hi @Luke Chen <[email protected]> > Sorry to trouble you. Considering you already voted for this KIP. So I > ping you here. Can you also help to take a look the question1 and queston2 > to give some more comments when you are free. > > Again. Thanks for all comments. They sparked further thoughts, and I look > forward to additional comments. Thanks a lot! > > Regards > Jian > > > Kamal Chandraprakash <[email protected]> 于2026年1月24日周六 > 10:37写道: > > > Hi Jian, > > > > Thanks for the KIP! Few questions: > > > > 1. Shall we rename the config `remote.log.latest.enable` to > > `remote.copy.lazy.enable`? > > The word latest somehow relates to the active segment and might > confuse > > the users. > > > > 2. Do we want to have an equivalent broker config to enable the feature > for > > all the topics in the cluster? > > remote.copy.lazy.enable / log.remote.copy.lazy.enable > > retention.ms / log.retention.ms > > local.retention.ms / log.local.retention.ms > > > > 3. When the remote copy is configured to be lazy, What is the behaviour > > when local and complete retention values are set to the same? > > Do we upload the data to remote, then immediately delete it from > > both remote and local? Or, do we skip uploading the segment to remote? > > > > Thanks, > > Kamal > > > > On Wed, Jan 7, 2026 at 6:18 PM jian fu <[email protected]> wrote: > > > > > Hi Luke: > > > > > > Thanks for your comments and I am sorry for the delayed response. > > > All of your understanding is right. > > > > > > "However in some cases, users are expecting consumer read with low > > > latency as much as possible" that is my case which I need to keep at > > least > > > one day's low latency to let business handle consume lag or some other > > > issues. > > > > > > > > > Regarding your comments: > > > > > > > > > 1. remote.log.keep.latest or remote.log.latest.enable? > > > > > > remote.log.latest.enable, let me correct the typo in the KIP content > > > though the code is right. > > > > > > 2. About the configuration doc: Determines whether to upload all > segments > > > to remote storage including the latest ones within local retention. > > > Why do we allow to upload the latest log segment to remote storage? The > > > latest log segment is the active log segment, right? > > > > > > Sorry, it is my mistake. The graph in the KIP is right. But the > > configure > > > doc is wrong, It should be > > > "Determines whether to upload all non-active segments to remote > storage, > > > including those still within local retention." > > > > > > > > > Done with the KIP content and related code change! > > > > > > Thanks a lot for your review and you can help to review again. > > > > > > Regards > > > Jian > > > > > > > > > Luke Chen <[email protected]> 于2026年1月6日周二 17:50写道: > > > > > > > Hi Jian, > > > > > > > > Thanks for the KIP! > > > > > > > > So, your goal is > > > > 1. allow consumers who are reading the hot data can still read from > the > > > > local storage. > > > > 2. try to avoid the duplicated data in local and remote as much as > > > > possible. > > > > Is my understanding correct? > > > > > > > > Currently, tiered storage keeps local for the local retention > time/size > > > > because we don't want the consumers who read the hot data with high > > > > latency. In this period of time, the duplication in local and remote > > > > storage is indeed a waste of cost. Although I also agree the cost > > should > > > > not be that huge because usually the local retention should not set > too > > > > high. However in some cases, users are expecting consumer read with > low > > > > latency as much as possible, the local retention is set to a high > value > > > and > > > > only expecting "very cold" data stored in the remote storage. In this > > > case, > > > > this KIP should be helpful. > > > > > > > > > > > > Comments: > > > > 1. remote.log.keep.latest or remote.log.latest.enable? > > > > 2. About the configuration doc: Determines whether to upload all > > segments > > > > to remote storage including the latest ones within local retention. > > > > Why do we allow to upload the latest log segment to remote storage? > The > > > > latest log segment is the active log segment, right? > > > > > > > > > > > > Thank you, > > > > Luke > > > > > > > > > > > > > > > > > > > > On Sun, Jan 4, 2026 at 10:19 PM jian fu <[email protected]> > wrote: > > > > > > > > > Hi All: > > > > > > > > > > Happy New Year! ! Bumping this thread again for more possible > > > discussion > > > > > before the vote starts. > > > > > Thanks a lot ! > > > > > > > > > > Regards > > > > > Jian > > > > > > > > > > jian fu <[email protected]> 于2025年12月15日周一 20:00写道: > > > > > > > > > > > Hi All: > > > > > > > > > > > > Bumping this thread for more discussion. I’d really appreciate > more > > > > > > suggestions on this optional feature for tiered storage. Thanks a > > > lot ! > > > > > > > > > > > > Regards > > > > > > > > > > > > Jian > > > > > > > > > > > > jian fu <[email protected]> 于2025年12月4日周四 21:54写道: > > > > > > > > > > > >> Hi All: > > > > > >> > > > > > >> I updated the KIP content according to Kamal and Haiying's > > > discussion: > > > > > >> 1 Explicitly emphasized that this is a topic-level optional > > feature > > > > > >> intended for users who prioritize cost. > > > > > >> 1 Added the cost-saving calculation example > > > > > >> 2 Added additional details about the operational drawback of > > this > > > > > >> feature: need extra disk expansion for the case: long time > remote > > > > > >> storage's outage. > > > > > >> 3 Added the scenarios where it may not be very suitable/ > > > beneficial > > > > to > > > > > >> enable the feature such as the topic's ratio for remote:local > > > > retention > > > > > is > > > > > >> a very big value. > > > > > >> > > > > > >> Thanks again for joining the discussion. > > > > > >> > > > > > >> Regards > > > > > >> Jian > > > > > >> > > > > > >> jian fu <[email protected]> 于2025年12月2日周二 20:27写道: > > > > > >> > > > > > >>> Hi Kamal: > > > > > >>> > > > > > >>> I think I understand what you mean now. I’ve updated the > picture > > in > > > > the > > > > > >>> link( > > > > > https://github.com/apache/kafka/pull/20913#issuecomment-3601274230 > ) > > > > > >>> . > > > > > >>> Could you help double-check whether we’ve reached the same > > > > > understanding? > > > > > >>> In short. the drawback of this KIP is that, during a long time > > > remote > > > > > >>> storage outage. it will occupied more disk. The max value is > the > > > > > redundant > > > > > >>> part we saving. > > > > > >>> Thus. After the outage recovered. It will come back to the > > > beginning. > > > > > >>> Pls help to correct me if my understanding is wrong! Thanks > > again. > > > > > >>> > > > > > >>> Regards > > > > > >>> Jian > > > > > >>> > > > > > >>> Kamal Chandraprakash <[email protected]> > > > 于2025年12月2日周二 > > > > > >>> 19:29写道: > > > > > >>> > > > > > >>>> The already uploaded segments are eligible for deletion from > the > > > > > broker. > > > > > >>>> So, when remote storage is down, > > > > > >>>> then those segments can be deleted as per the local retention > > > > settings > > > > > >>>> and > > > > > >>>> new segments can occupy those spaces. > > > > > >>>> This provides more time for the Admin to act when remote > storage > > > is > > > > > down > > > > > >>>> for a longer time. > > > > > >>>> > > > > > >>>> This is from a reliability perspective. > > > > > >>>> > > > > > >>>> On Tue, Dec 2, 2025 at 4:47 PM jian fu <[email protected]> > > > > wrote: > > > > > >>>> > > > > > >>>> > Hi Kamal and Haiying Cai: > > > > > >>>> > > > > > > >>>> > maybe you notice that my kafka clusters set 1day local + 3 > > > days-7 > > > > > days > > > > > >>>> > remote. thus Haiying Cai‘s configure is 3 hours local + 3 > > days > > > > > >>>> remote. > > > > > >>>> > > > > > > >>>> > I can explain more about my configure. > > > > > >>>> > I try to avoid the latency for some delay consumer to access > > the > > > > > >>>> remote. > > > > > >>>> > Maybe some applications may encounter some unexpected issue. > > but > > > > we > > > > > >>>> need to > > > > > >>>> > give enough time to handle it. In the period, we don't want > > the > > > > > >>>> consumer to > > > > > >>>> > access the remote to hurt the whole kafka clusters. So one > day > > > is > > > > > our > > > > > >>>> > expectation. > > > > > >>>> > > > > > > >>>> > I saw one statement in Haiying Cai KIP1248: > > > > > >>>> > " Currently, when a new consumer or a fallen-off consumer > > > requires > > > > > >>>> fetching > > > > > >>>> > messages from a while ago, and those messages are no longer > > > > present > > > > > >>>> in the > > > > > >>>> > Kafka broker's local storage, the broker must download the > > > message > > > > > >>>> from the > > > > > >>>> > remote tiered storage and subsequently transfer the data > back > > to > > > > the > > > > > >>>> > consumer. " > > > > > >>>> > Extend the local retention time is how we try to avoid the > > issue > > > > > >>>> (Here, we > > > > > >>>> > don't consider the case one new consumer use the earliest > > > strategy > > > > > to > > > > > >>>> > consume. it is not often happen in our cases.) > > > > > >>>> > > > > > > >>>> > So. based my configure. I will see there is one day's > > duplicated > > > > > >>>> segment > > > > > >>>> > wasting in remote storage. Thus I don't use them for real > time > > > > > >>>> analyst or > > > > > >>>> > care about the fast reboot or some thing else. So propose > > this > > > > KIP > > > > > >>>> to take > > > > > >>>> > one topic level optional feature to help us to reduce waste > > and > > > > save > > > > > >>>> money. > > > > > >>>> > > > > > > >>>> > Regards > > > > > >>>> > Jian > > > > > >>>> > > > > > > >>>> > jian fu <[email protected]> 于2025年12月2日周二 18:42写道: > > > > > >>>> > > > > > > >>>> > > Hi Kamal: > > > > > >>>> > > > > > > > >>>> > > Thanks for joining this discussion. Let me try to classify > > my > > > > > >>>> understands > > > > > >>>> > > for your good questions: > > > > > >>>> > > > > > > > >>>> > > 1 Kamal : Do you also have to update the RemoteCopy lag > > > > segments > > > > > >>>> and > > > > > >>>> > > bytes metric? > > > > > >>>> > > Jian: The code just delay the upload time for local > > > > segment. > > > > > >>>> So it > > > > > >>>> > > seems there is no need to change any lag segments or > > metrics. > > > > > right? > > > > > >>>> > > > > > > > >>>> > > 2 Kamal : As Haiying mentioned, the segments get > > eventually > > > > > >>>> uploaded > > > > > >>>> > to > > > > > >>>> > > remote so not sure about the > > > > > >>>> > > benefit of this proposal. And, remote storage cost is > > > considered > > > > > as > > > > > >>>> low > > > > > >>>> > > when compared to broker local-disk. > > > > > >>>> > > Jian: The cost benefit is about the total size for > > > > occupied. > > > > > >>>> Take > > > > > >>>> > AWS > > > > > >>>> > > S3 as example. Tiered price for: 1 GB is 0.02 USD (You can > > > refer > > > > > to > > > > > >>>> > > https://calculator.aws/#/createCalculator/S3). > > > > > >>>> > > It is cheaper than local disk. So as I mentioned that > the > > > > saving > > > > > >>>> money > > > > > >>>> > > depend on the ratio local vs remote retention time. If > your > > > set > > > > > the > > > > > >>>> > remote > > > > > >>>> > > storage time as a long time. The benefit is few, It is > just > > > > > >>>> avoiding the > > > > > >>>> > > waste instead of cost saving. > > > > > >>>> > > So I take it as topic level optional configure instead > of > > > > > default > > > > > >>>> > > feature. > > > > > >>>> > > > > > > > >>>> > > 3 Kamal: It provides some cushion during third-party > > object > > > > > >>>> storage > > > > > >>>> > > downtime. > > > > > >>>> > > Jian: I draw one picture to try to under the logic( > > > > > >>>> > > > > > > > https://github.com/apache/kafka/pull/20913#issuecomment-3601274230 > ). > > > > > >>>> You > > > > > >>>> > > can help to check if my understanding is right. I seemed > > that > > > no > > > > > >>>> > difference > > > > > >>>> > > for them. So for this question. maybe we need to discuss > > more > > > > > about > > > > > >>>> it. > > > > > >>>> > The > > > > > >>>> > > only difference maybe we may increase a little local disk > > for > > > > temp > > > > > >>>> due to > > > > > >>>> > > the delay for upload remote. So in the original proposal. > I > > > want > > > > > to > > > > > >>>> > upload > > > > > >>>> > > N-1 segments. But it seems the value is not much. > > > > > >>>> > > > > > > > >>>> > > BTW. I want to classify one basic rule: this feature isn't > > to > > > > > >>>> change the > > > > > >>>> > > default behavior. and the saving amount is not very big > > value > > > in > > > > > all > > > > > >>>> > cases. > > > > > >>>> > > It is suitable for part of topic which set a low ratio for > > > > > >>>> remote/local > > > > > >>>> > > such as 7days/1days or 3days/1day > > > > > >>>> > > At the last. Thanks again for your time and your comments. > > All > > > > the > > > > > >>>> > > questions are valid and good for us to thing more about > it. > > > > > >>>> > > > > > > > >>>> > > Regards > > > > > >>>> > > Jian > > > > > >>>> > > > > > > > >>>> > > > > > > > >>>> > > Kamal Chandraprakash <[email protected]> > > > > > 于2025年12月2日周二 > > > > > >>>> > > 17:41写道: > > > > > >>>> > > > > > > > >>>> > >> 1. Do you also have to update the RemoteCopy lag segments > > and > > > > > bytes > > > > > >>>> > >> metric? > > > > > >>>> > >> 2. As Haiying mentioned, the segments get eventually > > uploaded > > > > to > > > > > >>>> remote > > > > > >>>> > so > > > > > >>>> > >> not sure about the > > > > > >>>> > >> benefit of this proposal. And, remote storage cost is > > > > considered > > > > > >>>> as low > > > > > >>>> > >> when compared to broker local-disk. > > > > > >>>> > >> It provides some cushion during third-party object > storage > > > > > >>>> downtime. > > > > > >>>> > >> > > > > > >>>> > >> > > > > > >>>> > >> On Tue, Dec 2, 2025 at 2:45 PM Kamal Chandraprakash < > > > > > >>>> > >> [email protected]> wrote: > > > > > >>>> > >> > > > > > >>>> > >> > Hi Jian, > > > > > >>>> > >> > > > > > > >>>> > >> > Thanks for the KIP! > > > > > >>>> > >> > > > > > > >>>> > >> > When remote storage is unavailable for a few hrs, then > > with > > > > > lazy > > > > > >>>> > upload > > > > > >>>> > >> > there is a risk of the broker disk getting full soon. > > > > > >>>> > >> > The Admin has to configure the local retention configs > > > > > >>>> properly. With > > > > > >>>> > >> > eager upload, the disk utilization won't grow > > > > > >>>> > >> > until the local retention time (expectation is that all > > the > > > > > >>>> > >> > passive segments are uploaded). And, provides some time > > > > > >>>> > >> > for the Admin to take any action based on the > situation. > > > > > >>>> > >> > > > > > > >>>> > >> > -- > > > > > >>>> > >> > Kamal > > > > > >>>> > >> > > > > > > >>>> > >> > On Tue, Dec 2, 2025 at 10:28 AM Haiying Cai via dev < > > > > > >>>> > >> [email protected]> > > > > > >>>> > >> > wrote: > > > > > >>>> > >> > > > > > > >>>> > >> >> Jian, > > > > > >>>> > >> >> > > > > > >>>> > >> >> Understands this is an optional feature and the cost > > > saving > > > > > >>>> depends > > > > > >>>> > on > > > > > >>>> > >> >> the ratio between local.retention.ms and total > > > retention.ms > > > > . > > > > > >>>> > >> >> > > > > > >>>> > >> >> In our setup, we have local.retention set to 3 hours > and > > > > total > > > > > >>>> > >> retention > > > > > >>>> > >> >> set to 3 days, so the saving is not going to be > > > significant. > > > > > >>>> > >> >> > > > > > >>>> > >> >> On 2025/12/01 05:33:11 jian fu wrote: > > > > > >>>> > >> >> > Hi Haiying Cai, > > > > > >>>> > >> >> > > > > > > >>>> > >> >> > Thanks for joining the discussion for this KIP. All > of > > > > your > > > > > >>>> > concerns > > > > > >>>> > >> are > > > > > >>>> > >> >> > valid, and that is exactly why I introduced a > > > topic-level > > > > > >>>> > >> configuration > > > > > >>>> > >> >> to > > > > > >>>> > >> >> > make this feature optional. This means that, by > > default, > > > > the > > > > > >>>> > behavior > > > > > >>>> > >> >> > remains unchanged. Only when users are not pursuing > > > faster > > > > > >>>> broker > > > > > >>>> > >> boot > > > > > >>>> > >> >> time > > > > > >>>> > >> >> > or other optimizations — and care more about cost — > > > would > > > > > they > > > > > >>>> > enable > > > > > >>>> > >> >> this > > > > > >>>> > >> >> > option to some topics to save resources. > > > > > >>>> > >> >> > > > > > > >>>> > >> >> > Regarding cost self: the actual savings depend on > the > > > > ratio > > > > > >>>> between > > > > > >>>> > >> >> local > > > > > >>>> > >> >> > retention and remote retention. In the KIP/PR, I > > > provided > > > > a > > > > > >>>> test > > > > > >>>> > >> >> example: > > > > > >>>> > >> >> > if we configure 1 day of local retention and 2 days > of > > > > > remote > > > > > >>>> > >> >> retention, we > > > > > >>>> > >> >> > can save about 50%. And realistically, I don't think > > > > anyone > > > > > >>>> would > > > > > >>>> > >> boldly > > > > > >>>> > >> >> > set local retention to a very small value (such as > > > > minutes) > > > > > >>>> due to > > > > > >>>> > >> the > > > > > >>>> > >> >> > latency concerns associated with remote storage. So > in > > > > > short, > > > > > >>>> the > > > > > >>>> > >> >> feature > > > > > >>>> > >> >> > will help reduce cost, and the amount saved simply > > > depends > > > > > on > > > > > >>>> the > > > > > >>>> > >> ratio. > > > > > >>>> > >> >> > Take my company's usage as real example, we > configure > > > most > > > > > of > > > > > >>>> the > > > > > >>>> > >> >> topics: 1 > > > > > >>>> > >> >> > day of local retention and 3–7 days of remote > storage > > (3 > > > > > days > > > > > >>>> for > > > > > >>>> > >> topic > > > > > >>>> > >> >> > with log/metric usage, 7 days for topic with normal > > > > business > > > > > >>>> > usage). > > > > > >>>> > >> >> and we > > > > > >>>> > >> >> > don't care about the boot speed and some thing else, > > > This > > > > > KIP > > > > > >>>> > allows > > > > > >>>> > >> us > > > > > >>>> > >> >> to > > > > > >>>> > >> >> > save 1/7 to 1/3 of the total disk usage for remote > > > > storage. > > > > > >>>> > >> >> > > > > > > >>>> > >> >> > Anyway, this is just a topic-level optional feature > > > which > > > > > >>>> don't > > > > > >>>> > >> reject > > > > > >>>> > >> >> the > > > > > >>>> > >> >> > benifit for current design. Thanks again for the > > > > discussion. > > > > > >>>> I can > > > > > >>>> > >> >> update > > > > > >>>> > >> >> > the KIP to better classify scenarios where this > > optional > > > > > >>>> feature is > > > > > >>>> > >> not > > > > > >>>> > >> >> > suitable. Currently, I only listed real-time > analytics > > > as > > > > > the > > > > > >>>> > >> negative > > > > > >>>> > >> >> > example. > > > > > >>>> > >> >> > > > > > > >>>> > >> >> > Welcome further discussion to help make this KIP > more > > > > > >>>> complete. > > > > > >>>> > >> Thanks! > > > > > >>>> > >> >> > > > > > > >>>> > >> >> > Regards, > > > > > >>>> > >> >> > Jian > > > > > >>>> > >> >> > > > > > > >>>> > >> >> > Haiying Cai via dev <[email protected]> > > > > 于2025年12月1日周一 > > > > > >>>> > 12:40写道: > > > > > >>>> > >> >> > > > > > > >>>> > >> >> > > Jian, > > > > > >>>> > >> >> > > > > > > > >>>> > >> >> > > Thanks for the contribution. But I feel the > > uploading > > > > the > > > > > >>>> local > > > > > >>>> > >> >> segment > > > > > >>>> > >> >> > > file to remote storage ASAP is advantageous in > > several > > > > > >>>> scenarios: > > > > > >>>> > >> >> > > > > > > > >>>> > >> >> > > 1. Enable the fast bootstrapping a new broker. A > > new > > > > > broker > > > > > >>>> > >> doesn’t > > > > > >>>> > >> >> have > > > > > >>>> > >> >> > > to replicate all the data from the leader broker, > it > > > > only > > > > > >>>> needs > > > > > >>>> > to > > > > > >>>> > >> >> > > replicate the data from the tail of the remote log > > > > segment > > > > > >>>> to the > > > > > >>>> > >> >> tail of > > > > > >>>> > >> >> > > the current end of the topic (LSO) since all the > > other > > > > > data > > > > > >>>> are > > > > > >>>> > in > > > > > >>>> > >> the > > > > > >>>> > >> >> > > remote tiered storage and it can download them > later > > > > > >>>> lazily, this > > > > > >>>> > >> is > > > > > >>>> > >> >> what > > > > > >>>> > >> >> > > KIP-1023 trying to solve; > > > > > >>>> > >> >> > > 2. Although nobody has proposed a KIP to allow a > > > > consumer > > > > > >>>> client > > > > > >>>> > to > > > > > >>>> > >> >> read > > > > > >>>> > >> >> > > from the remote tiered storage directly, but this > > will > > > > > >>>> helps the > > > > > >>>> > >> >> > > fall-behind consumer to do catch-up reads or > perform > > > the > > > > > >>>> > backfill. > > > > > >>>> > >> >> This > > > > > >>>> > >> >> > > path allows the consumer backfill to finish > without > > > > > >>>> polluting the > > > > > >>>> > >> >> broker’s > > > > > >>>> > >> >> > > page cache. The earlier the data is on the remote > > > > tiered > > > > > >>>> > storage, > > > > > >>>> > >> >> the more > > > > > >>>> > >> >> > > advantageous it is for the client. > > > > > >>>> > >> >> > > > > > > > >>>> > >> >> > > I think in your Proposal, you are delaying > uploading > > > the > > > > > >>>> segment > > > > > >>>> > >> but > > > > > >>>> > >> >> the > > > > > >>>> > >> >> > > file will still be uploaded at a later time, I > guess > > > > this > > > > > >>>> can > > > > > >>>> > >> saves a > > > > > >>>> > >> >> few > > > > > >>>> > >> >> > > hours storage cost for that file in the remote > > > storage, > > > > > not > > > > > >>>> sure > > > > > >>>> > >> >> whether > > > > > >>>> > >> >> > > that is a significant cost saved (if the file > needs > > to > > > > > stay > > > > > >>>> in > > > > > >>>> > >> remote > > > > > >>>> > >> >> > > tiered storage for several days or weeks due to > > > > retention > > > > > >>>> > policy). > > > > > >>>> > >> >> > > > > > > > >>>> > >> >> > > On 2025/11/19 13:29:11 jian fu wrote: > > > > > >>>> > >> >> > > > Hi everyone, I'd like to start a discussion on > > > > KIP-1241, > > > > > >>>> the > > > > > >>>> > goal > > > > > >>>> > >> >> is to > > > > > >>>> > >> >> > > > reduce the remote storage. KIP: > > > > > >>>> > >> >> > > > > > > > > >>>> > >> >> > > > > > > > >>>> > >> >> > > > > > >>>> > >> > > > > > >>>> > > > > > > >>>> > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1241%3A+Reduce+tiered+storage+redundancy+with+delayed+upload > > > > > >>>> > >> >> > > > > > > > > >>>> > >> >> > > > The Draft PR: > > > > > >>>> https://github.com/apache/kafka/pull/20913 > > > > > >>>> > >> >> Problem: > > > > > >>>> > >> >> > > > Currently, > > > > > >>>> > >> >> > > > Kafka's tiered storage implementation uploads > all > > > > > >>>> non-active > > > > > >>>> > >> local > > > > > >>>> > >> >> log > > > > > >>>> > >> >> > > > segments to remote storage immediately, even > when > > > they > > > > > are > > > > > >>>> > still > > > > > >>>> > >> >> within > > > > > >>>> > >> >> > > the > > > > > >>>> > >> >> > > > local retention period. > > > > > >>>> > >> >> > > > This results in redundant storage of the same > data > > > in > > > > > both > > > > > >>>> > local > > > > > >>>> > >> and > > > > > >>>> > >> >> > > remote > > > > > >>>> > >> >> > > > tiers. > > > > > >>>> > >> >> > > > > > > > > >>>> > >> >> > > > When there is no requirement for real-time > > analytics > > > > or > > > > > >>>> > immediate > > > > > >>>> > >> >> > > > consumption based on remote storage. It has the > > > > > following > > > > > >>>> > >> drawbacks: > > > > > >>>> > >> >> > > > > > > > > >>>> > >> >> > > > 1. Wastes storage capacity and costs: The same > > data > > > is > > > > > >>>> stored > > > > > >>>> > >> twice > > > > > >>>> > >> >> > > during > > > > > >>>> > >> >> > > > the local retention window > > > > > >>>> > >> >> > > > 2. Provides no immediate benefit: During the > local > > > > > >>>> retention > > > > > >>>> > >> period, > > > > > >>>> > >> >> > > reads > > > > > >>>> > >> >> > > > prioritize local data, making the remote copy > > > > > unnecessary > > > > > >>>> > >> >> > > > > > > > > >>>> > >> >> > > > > > > > > >>>> > >> >> > > > So. this KIP is to reduce tiered storage > > redundancy > > > > with > > > > > >>>> > delayed > > > > > >>>> > >> >> upload. > > > > > >>>> > >> >> > > > You can check the test result example here > > directly: > > > > > >>>> > >> >> > > > > > > > > >>>> > >> > > > > > https://github.com/apache/kafka/pull/20913#issuecomment-3547156286 > > > > > >>>> > >> >> > > > Looking forward to your feedback! Best regards, > > Jian > > > > > >>>> > >> >> > > > > > > > > >>>> > >> >> > > > > > > >>>> > >> > > > > > > >>>> > >> > > > > > > >>>> > >> > > > > > >>>> > > > > > > > >>>> > > > > > > > >>>> > > > > > > > >>>> > > > > > > > >>>> > > > > > > >>>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >> > > > > > >> > > > > > >> > > > > > > > > > > > > -- > > > > > > Regards > > > > > > > > > > > > Fu.Jian > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
