Hi Kamal, Thanks for raising this.
Currently, only the existing version of RemoteLogSegmentMetadataUpdateRecord does not include those fields. We rely on the time-based retention policy for cleanup, and this does not impact the ability to rebuild the RemoteLogMetadataCache. The cache reconstruction should still work correctly because it depends on the value, and we have not removed any fields from the value. Regarding the scenario where there are 10 remote segments and the __remote_log_metadata topic contains only COPY_SEGMENT_FINISHED events: the COPY_SEGMENT_STARTED events will not be compacted in this case, since messages with a null key are not subject to compaction. Once older-format messages are cleaned up by the time-based retention policy and compaction is enabled, records with the same key will be compacted asynchronously and correctly. Given this, I don’t believe we need to introduce a separate key for COPY_SEGMENT_STARTED events. Happy to jump on a call if it’s easier to discuss further. Best, Lijun Tong Lijun Tong <[email protected]> 于2026年4月5日周日 10:56写道: > Hey Kamal, > > I am not very clear on what's the question you mentioned above, I am > happy to jump to a call to discuss further, and I lived in PST time zone. > Maybe we can meet online through google meet? > > Thanks, > Lijun Tong > > Kamal Chandraprakash <[email protected]> 于2026年4月1日周三 > 02:38写道: > >> Hi Lijun, >> >> Thanks for the update! I'm still not clear on this. >> >> The RemoteLogSegmentMetadataUpdateRecord does not contain the below fields >> compared to RemoteLogSegmentMetadataRecord: >> >> - startOffset >> - endOffset (will be added as a tagged field) >> - MaxTimestampMs >> - SegmentLeaderEpochs >> - SegmentSizeInBytes and >> - TxnIndexEmpty >> >> When a broker gets restarted, will it be able to rebuild >> the RemoteLogMetadataCache? Assume that there are 10 remote >> segments and the __remote_log_metadata topic contains only the >> COPY_SEGMENT_FINISHED events; the COPY_SEGMENT_STARTED event >> gets compacted as the key is the same. >> >> Do we need a separate key for the COPY_SEGMENT_STARTED event and another >> key for the remaining states? >> >> Current key format: TopicIdPartition:EndOffset:BrokerLeaderEpoch >> Proposed key format: TopicIdPartition:EndOffset:BrokerLeaderEpoch:x/y >> where >> x denotes a identifier for COPY_SEGMENT_STARTED and y denote for all the >> other events. >> >> Thanks, >> Kamal >> >> >> >> On Tue, Mar 31, 2026 at 8:23 AM Lijun Tong <[email protected]> >> wrote: >> >> > Hi Kamal, >> > >> > The scenario you described only happened with the old version >> > RemoteLogSegmentUpdateMetadata message, since the endOffset will be >> added >> > in the new RemoteLogSegmentUpdateMetadata schema. For the existing >> > RemoteLogSegmentUpdateMetadata messages, we rely on the time based >> > retention policy to clean up. Does that make sense? >> > >> > Best, >> > Lijun Tong >> > >> > Kamal Chandraprakash <[email protected]> 于2026年3月30日周一 >> > 18:14写道: >> > >> > > Hi Lijun, >> > > >> > > RemoteLogSegmentUpdateMetadata event does not have all the >> > > fields/attributes similar to RemoteLogSegmentMetadata event. >> > > >> > > Assume that after compaction, for a segment, we have only >> > > COPY_SEGMENT_FINISHED records. How do you plan to retrieve the other >> > fields >> > > after broker restart? >> > > >> > > Thanks, >> > > Kamal >> > > >> > > On Mon, Mar 30, 2026, 23:22 Lijun Tong <[email protected]> >> wrote: >> > > >> > > > Hi Kamal, >> > > > >> > > > Thanks for taking another look at the KIP. >> > > > 1. I have removed the left-over line about using another new topic >> from >> > > the >> > > > KIP. >> > > > 2. >> > > > >> > > > > 2. Assume that the topic is enabled with compaction and only one >> > event >> > > is >> > > > > maintained per segment. If there is a transient error in the >> remote >> > log >> > > > > deletion, >> > > > > then the COPY_SEGMENT started / finished events might be >> > compacted >> > > by >> > > > > the DELETE_SEGMENT_STARTED events. If the broker is restarted >> during >> > > > > this time, will there be dangling remote log segments? >> Currently, >> > > > > during restart, the broker discards the events if it does not see >> the >> > > > > COPY_SEGMENT_STARTED events. >> > > > >> > > > >> > > > I am glad you asked this question, I didn't mention this part in my >> > > current >> > > > design to avoid distractions from the main design, but I plan to add >> > > > another background thread to clean up all the stale messages by >> > comparing >> > > > the message's endOffset with the topic partition's log start >> offset. I >> > > > believe this would help remove all the dangling messages. >> > > > >> > > > Thanks, >> > > > Lijun TOng >> > > > >> > > > Kamal Chandraprakash <[email protected]> 于2026年3月29日周日 >> > > > 22:48写道: >> > > > >> > > > > Hi Lijun, >> > > > > >> > > > > Sorry for the late reply. Went over the KIP again. Overall LGTM. >> Few >> > > > > points: >> > > > > >> > > > > > This KIP aims to solve this issue through introducing another >> > > compacted >> > > > > topic for the brokers to bootstrap the state from >> > > > > >> > > > > 1. Shall we update the motivation section to mention that another >> > topic >> > > > is >> > > > > not introduced? >> > > > > 2. Assume that the topic is enabled with compaction and only one >> > event >> > > is >> > > > > maintained per segment. If there is a transient error in the >> remote >> > log >> > > > > deletion, >> > > > > then the COPY_SEGMENT started / finished events might be >> > compacted >> > > by >> > > > > the DELETE_SEGMENT_STARTED events. If the broker is restarted >> during >> > > > > this time, will there be dangling remote log segments? >> Currently, >> > > > > during restart, the broker discards the events if it does not see >> the >> > > > > COPY_SEGMENT_STARTED events. >> > > > > >> > > > > Thanks, >> > > > > Kamal >> > > > > >> > > > > On Thu, Mar 26, 2026 at 5:08 AM Lijun Tong < >> [email protected]> >> > > > > wrote: >> > > > > >> > > > > > Hi, >> > > > > > >> > > > > > I have started a Vote thread for this KIP, considering all >> > questions >> > > > > raised >> > > > > > so far have been answered. I am happy to continue the >> discussion if >> > > > > needed, >> > > > > > otherwise, this is a friendly reminder on the vote for this KIP. >> > > > > > >> > > > > > Thanks, >> > > > > > Lijun Tong >> > > > > > >> > > > > > >> > > > > > >> > > > > > Lijun Tong <[email protected]> 于2026年1月19日周一 17:59写道: >> > > > > > >> > > > > > > Hey Kamal, >> > > > > > > >> > > > > > > Thanks for raising these questions. Here are my responses to >> your >> > > > > > > questions: >> > > > > > > Q1 and Q2: >> > > > > > > I think both questions boil down to how to release this new >> > > feature, >> > > > > both >> > > > > > > questions are valid concerns. The solution I have in mind is >> this >> > > > > feature >> > > > > > > is *gated by the metadata version*. The new tombstone >> semantics >> > and >> > > > the >> > > > > > > additional fields (for example in >> RemoteLogSegmentUpdateRecord) >> > are >> > > > > only >> > > > > > > enabled once the cluster metadata version is upgraded to the >> > > version >> > > > > that >> > > > > > > introduces this feature. As long as the cluster metadata >> version >> > is >> > > > not >> > > > > > > bumped, the system will not produce tombstone records. >> Therefore, >> > > > > during >> > > > > > > rolling upgrades (mixed 4.2/4.3 brokers), the feature remains >> > > > > effectively >> > > > > > > disabled. Tombstones will only start being produced after the >> > > > metadata >> > > > > > > version is upgraded, at which point all brokers are already >> > > required >> > > > to >> > > > > > > support the new behavior. >> > > > > > > >> > > > > > > Since Kafka does not support metadata version downgrades at >> the >> > > > moment, >> > > > > > > once a metadata version that supports this feature is >> enabled, it >> > > > > cannot >> > > > > > be >> > > > > > > downgraded to a version that does not support it. I will add >> > these >> > > > > > details >> > > > > > > to the KIP later. >> > > > > > > Q3. This is an *editing mistake* in the KIP. Thanks for >> pointing >> > it >> > > > > out — >> > > > > > > the enum value has already been corrected in the latest >> revision >> > to >> > > > > > remove >> > > > > > > the unused placeholder and keep the state values contiguous >> and >> > > > > > consistent. >> > > > > > > Q4. I don't foresee the quota mechanism will interfere with >> the >> > > state >> > > > > > > transition in any way so far, let me know if any concern hits >> > you. >> > > > > > > >> > > > > > > Thanks, >> > > > > > > Lijun >> > > > > > > >> > > > > > > Kamal Chandraprakash <[email protected]> >> > > 于2026年1月18日周日 >> > > > > > > 00:40写道: >> > > > > > > >> > > > > > >> Hi Lijun, >> > > > > > >> >> > > > > > >> Thanks for updating the KIP! >> > > > > > >> >> > > > > > >> The updated migration plan looks clean to me. Few questions: >> > > > > > >> >> > > > > > >> 1. The ConsumerTask in 4.2 Kafka build does not handle the >> > > tombstone >> > > > > > >> records. Should the tombstone records be sent only when all >> the >> > > > > brokers >> > > > > > >> are >> > > > > > >> upgraded to 4.3 version? >> > > > > > >> >> > > > > > >> 2. Once all the brokers are upgraded and the >> > __remote_log_metadata >> > > > > topic >> > > > > > >> cleanup policy changed to compact. Then, downgrading the >> brokers >> > > is >> > > > > not >> > > > > > >> allowed as the records without key will throw an error while >> > > > producing >> > > > > > the >> > > > > > >> compacted topic. Shall we mention this in the compatibility >> > > section? >> > > > > > >> >> > > > > > >> 3. In the RemoteLogSegmentState Enum, why is the value 1 >> marked >> > as >> > > > > > unused? >> > > > > > >> >> > > > > > >> 4. Regarding the key >> > > (TopicIdPartition:EndOffset:BrokerLeaderEpoch), >> > > > > we >> > > > > > >> may >> > > > > > >> have to check for scenarios where there is segment lag due to >> > > remote >> > > > > log >> > > > > > >> write quota. Will the state transition work correctly? Will >> come >> > > > back >> > > > > to >> > > > > > >> this later. >> > > > > > >> >> > > > > > >> Thanks, >> > > > > > >> Kamal >> > > > > > >> >> > > > > > >> On Fri, Jan 16, 2026 at 4:50 AM jian fu < >> [email protected]> >> > > > wrote: >> > > > > > >> >> > > > > > >> > Hi Lijun and Kamal >> > > > > > >> > I also think we don't need to keep delJIanpolicy in final >> > > > config,if >> > > > > > >> so,we >> > > > > > >> > should always keep remembering all of our topic retention >> time >> > > > must >> > > > > > less >> > > > > > >> > than the value,right?It is one protect with risk involved. >> > > > > > >> > Regards >> > > > > > >> > JIan >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > Lijun Tong <[email protected]>于2026年1月16日 周五06:45写道: >> > > > > > >> > >> > > > > > >> > > Hey Kamal, >> > > > > > >> > > >> > > > > > >> > > Some additional points about the Q4, >> > > > > > >> > > >> > > > > > >> > > > The user can decide when to change their internal topic >> > > > cleanup >> > > > > > >> policy >> > > > > > >> > to >> > > > > > >> > > > compact. If someone retains >> > > > > > >> > > > the data in the remote storage for 3 months, then they >> can >> > > > > migrate >> > > > > > >> to >> > > > > > >> > the >> > > > > > >> > > > compacted topic after 3 months >> > > > > > >> > > > post rolling out this change. And, update their cleanup >> > > policy >> > > > > to >> > > > > > >> > > [compact, >> > > > > > >> > > > delete]. >> > > > > > >> > > >> > > > > > >> > > >> > > > > > >> > > I don't think it's a good idea to keep delete in the >> final >> > > > cleanup >> > > > > > >> policy >> > > > > > >> > > for the topic `__remote_log_metadata`, as this still >> > requires >> > > > the >> > > > > > >> user to >> > > > > > >> > > keep track of the max retention hours of topics that have >> > > remote >> > > > > > >> storage >> > > > > > >> > > enabled, and it's operational burden. It's also hard to >> > reason >> > > > > about >> > > > > > >> what >> > > > > > >> > > will happen if the user configures the wrong >> retention.ms. >> > I >> > > > hope >> > > > > > >> this >> > > > > > >> > > makes sense. >> > > > > > >> > > >> > > > > > >> > > >> > > > > > >> > > Thanks, >> > > > > > >> > > Lijun Tong >> > > > > > >> > > >> > > > > > >> > > Lijun Tong <[email protected]> 于2026年1月15日周四 >> 11:43写道: >> > > > > > >> > > >> > > > > > >> > > > Hey Kamal, >> > > > > > >> > > > >> > > > > > >> > > > Thanks for your reply! I am glad we are on the same >> page >> > > with >> > > > > > making >> > > > > > >> > the >> > > > > > >> > > > __remote_log_metadata topic compacted optional for the >> > user >> > > > > now, I >> > > > > > >> will >> > > > > > >> > > > update the KIP with this change. >> > > > > > >> > > > >> > > > > > >> > > > For the Q2: >> > > > > > >> > > > With the key designed as >> > > > > > >> TopicId:Partition:EndOffset:BrokerLeaderEpoch, >> > > > > > >> > > > even the same broker retries the upload multiple times >> for >> > > the >> > > > > > same >> > > > > > >> log >> > > > > > >> > > > segment, the latest retry attempt with the latest >> segment >> > > UUID >> > > > > > will >> > > > > > >> > > > overwrite the previous attempts' value since they share >> > the >> > > > same >> > > > > > >> key, >> > > > > > >> > so >> > > > > > >> > > we >> > > > > > >> > > > don't need to explicitly track the failed upload >> metadata, >> > > > > because >> > > > > > >> it's >> > > > > > >> > > > gone already by the later attempt. That's my >> understanding >> > > > about >> > > > > > the >> > > > > > >> > > > RLMCopyTask, correct me if I am wrong. >> > > > > > >> > > > >> > > > > > >> > > > Thanks, >> > > > > > >> > > > Lijun Tong >> > > > > > >> > > > >> > > > > > >> > > > Kamal Chandraprakash <[email protected]> >> > > > > > 于2026年1月14日周三 >> > > > > > >> > > > 21:18写道: >> > > > > > >> > > > >> > > > > > >> > > >> Hi Lijun, >> > > > > > >> > > >> >> > > > > > >> > > >> Thanks for the reply! >> > > > > > >> > > >> >> > > > > > >> > > >> Q1: Sounds good. Could you clarify it in the KIP that >> the >> > > > same >> > > > > > >> > > partitioner >> > > > > > >> > > >> will be used? >> > > > > > >> > > >> >> > > > > > >> > > >> Q2: With TopicId:Partition:EndOffset:BrokerLeaderEpoch >> > key, >> > > > if >> > > > > > the >> > > > > > >> > same >> > > > > > >> > > >> broker retries the upload due to intermittent >> > > > > > >> > > >> issues in object storage (or) RLMM. Then, those failed >> > > upload >> > > > > > >> metadata >> > > > > > >> > > >> also >> > > > > > >> > > >> need to be cleared. >> > > > > > >> > > >> >> > > > > > >> > > >> Q3: We may have to skip the null value records in the >> > > > > > ConsumerTask. >> > > > > > >> > > >> >> > > > > > >> > > >> Q4a: The idea is to keep the cleanup policy as >> "delete" >> > and >> > > > > also >> > > > > > >> send >> > > > > > >> > > the >> > > > > > >> > > >> tombstone markers >> > > > > > >> > > >> to the existing `__remote_log_metadata` topic. And, >> > handle >> > > > the >> > > > > > >> > tombstone >> > > > > > >> > > >> records in the ConsumerTask. >> > > > > > >> > > >> >> > > > > > >> > > >> The user can decide when to change their internal >> topic >> > > > cleanup >> > > > > > >> policy >> > > > > > >> > > to >> > > > > > >> > > >> compact. If someone retains >> > > > > > >> > > >> the data in the remote storage for 3 months, then they >> > can >> > > > > > migrate >> > > > > > >> to >> > > > > > >> > > the >> > > > > > >> > > >> compacted topic after 3 months >> > > > > > >> > > >> post rolling out this change. And, update their >> cleanup >> > > > policy >> > > > > to >> > > > > > >> > > >> [compact, >> > > > > > >> > > >> delete]. >> > > > > > >> > > >> >> > > > > > >> > > >> Thanks, >> > > > > > >> > > >> Kamal >> > > > > > >> > > >> >> > > > > > >> > > >> On Thu, Jan 15, 2026 at 4:12 AM Lijun Tong < >> > > > > > >> [email protected]> >> > > > > > >> > > >> wrote: >> > > > > > >> > > >> >> > > > > > >> > > >> > Hey Jian, >> > > > > > >> > > >> > >> > > > > > >> > > >> > Thanks for your time to review this KIP. I >> appreciate >> > > that >> > > > > you >> > > > > > >> > > propose a >> > > > > > >> > > >> > simpler migration solution to onboard the new >> feature. >> > > > > > >> > > >> > >> > > > > > >> > > >> > There are 2 points that I think can be further >> refined >> > > on: >> > > > > > >> > > >> > >> > > > > > >> > > >> > 1). make the topic compacted optional, although the >> new >> > > > > feature >> > > > > > >> will >> > > > > > >> > > >> > continue to emit tombstone message for those expired >> > log >> > > > > > segments >> > > > > > >> > even >> > > > > > >> > > >> when >> > > > > > >> > > >> > the topic is still on time-based retention mode, so >> > once >> > > > user >> > > > > > >> > switched >> > > > > > >> > > >> to >> > > > > > >> > > >> > using the compacted topic, those expired messages >> can >> > > still >> > > > > be >> > > > > > >> > deleted >> > > > > > >> > > >> > despite the topic is not retention based anymore. >> > > > > > >> > > >> > 2). we need to expose some flag to the user to >> indicate >> > > > > whether >> > > > > > >> the >> > > > > > >> > > >> topic >> > > > > > >> > > >> > can be flipped to compacted by checking whether all >> the >> > > old >> > > > > > >> format >> > > > > > >> > > >> > keyed-less message has expired, and allow user to >> > choose >> > > to >> > > > > > flip >> > > > > > >> to >> > > > > > >> > > >> > compacted only when the flag is true. >> > > > > > >> > > >> > >> > > > > > >> > > >> > Thanks for sharing your idea. I will update the KIP >> > later >> > > > > with >> > > > > > >> this >> > > > > > >> > > new >> > > > > > >> > > >> > idea. >> > > > > > >> > > >> > >> > > > > > >> > > >> > Best, >> > > > > > >> > > >> > Lijun Tong >> > > > > > >> > > >> > >> > > > > > >> > > >> > >> > > > > > >> > > >> > jian fu <[email protected]> 于2026年1月12日周一 >> 04:55写道: >> > > > > > >> > > >> > >> > > > > > >> > > >> > > Hi Lijun Tong: >> > > > > > >> > > >> > > >> > > > > > >> > > >> > > Thanks for your KIP which raise this critical >> issue. >> > > > > > >> > > >> > > >> > > > > > >> > > >> > > what about just keep one topic instead of involve >> > > another >> > > > > > >> topic. >> > > > > > >> > > >> > > for existed topic data's migration. maybe we can >> use >> > > this >> > > > > way >> > > > > > >> to >> > > > > > >> > > solve >> > > > > > >> > > >> > the >> > > > > > >> > > >> > > issue: >> > > > > > >> > > >> > > (1) set the retention date > all of topic which >> > enable >> > > > > remote >> > > > > > >> > > >> storage's >> > > > > > >> > > >> > > retention time >> > > > > > >> > > >> > > (2) deploy new kafka version with feature: which >> > send >> > > > the >> > > > > > >> message >> > > > > > >> > > >> with >> > > > > > >> > > >> > key >> > > > > > >> > > >> > > (3) wait all the message expired and new message >> with >> > > key >> > > > > > >> coming >> > > > > > >> > to >> > > > > > >> > > >> the >> > > > > > >> > > >> > > topic >> > > > > > >> > > >> > > (4) convert the topic to compact >> > > > > > >> > > >> > > >> > > > > > >> > > >> > > I don't test it. Just propose this solution >> according >> > > to >> > > > > code >> > > > > > >> > review >> > > > > > >> > > >> > > result. just for your reference. >> > > > > > >> > > >> > > The steps maybe a little complex. but it can >> avoiding >> > > add >> > > > > new >> > > > > > >> > topic. >> > > > > > >> > > >> > > >> > > > > > >> > > >> > > Regards >> > > > > > >> > > >> > > Jian >> > > > > > >> > > >> > > >> > > > > > >> > > >> > > Lijun Tong <[email protected]> 于2026年1月8日周四 >> > > > 09:17写道: >> > > > > > >> > > >> > > >> > > > > > >> > > >> > > > Hey Kamal, >> > > > > > >> > > >> > > > >> > > > > > >> > > >> > > > >> > > > > > >> > > >> > > > Thanks for your time for the review. >> > > > > > >> > > >> > > > >> > > > > > >> > > >> > > > >> > > > > > >> > > >> > > > Here is my response to your questions: >> > > > > > >> > > >> > > > >> > > > > > >> > > >> > > > Q1 At this point, I don’t see a need to change >> > > > > > >> > > >> > > > RemoteLogMetadataTopicPartitioner for this >> design. >> > > > > Nothing >> > > > > > in >> > > > > > >> > the >> > > > > > >> > > >> > current >> > > > > > >> > > >> > > > approach appears to require a partitioner >> change, >> > but >> > > > I’m >> > > > > > >> open >> > > > > > >> > to >> > > > > > >> > > >> > > > revisiting if a concrete need arises. >> > > > > > >> > > >> > > > >> > > > > > >> > > >> > > > Q2 I have some reservations about using >> > > SegmentId:State >> > > > > as >> > > > > > >> the >> > > > > > >> > > key. >> > > > > > >> > > >> A >> > > > > > >> > > >> > > > practical challenge we see today is that the >> same >> > > > logical >> > > > > > >> > segment >> > > > > > >> > > >> can >> > > > > > >> > > >> > be >> > > > > > >> > > >> > > > retried multiple times with different SegmentIds >> > > across >> > > > > > >> brokers. >> > > > > > >> > > If >> > > > > > >> > > >> the >> > > > > > >> > > >> > > key >> > > > > > >> > > >> > > > is SegmentId-based, it becomes harder to >> discover >> > and >> > > > > > >> tombstone >> > > > > > >> > > all >> > > > > > >> > > >> > > related >> > > > > > >> > > >> > > > attempts when the segment eventually expires. >> The >> > > > > > >> > > >> > > > TopicId:Partition:EndOffset:BrokerLeaderEpoch >> key >> > is >> > > > > > >> > deterministic >> > > > > > >> > > >> for >> > > > > > >> > > >> > a >> > > > > > >> > > >> > > > logical segment attempt and helps group retries >> by >> > > > epoch, >> > > > > > >> which >> > > > > > >> > > >> > > simplifies >> > > > > > >> > > >> > > > cleanup and reasoning about state. I’d love to >> > > > understand >> > > > > > the >> > > > > > >> > > >> benefits >> > > > > > >> > > >> > > > you’re seeing with SegmentId:State compared to >> the >> > > > > > >> > > >> offset/epoch-based >> > > > > > >> > > >> > key >> > > > > > >> > > >> > > > so we can weigh the trade-offs. >> > > > > > >> > > >> > > > >> > > > > > >> > > >> > > > On partitioning: with this proposal, all states >> > for a >> > > > > given >> > > > > > >> user >> > > > > > >> > > >> > > > topic-partition still map to the same metadata >> > > > partition. >> > > > > > >> That >> > > > > > >> > > >> remains >> > > > > > >> > > >> > > true >> > > > > > >> > > >> > > > for the existing __remote_log_metadata >> (unchanged >> > > > > > >> partitioner) >> > > > > > >> > and >> > > > > > >> > > >> for >> > > > > > >> > > >> > > the >> > > > > > >> > > >> > > > new __remote_log_metadata_compacted, preserving >> the >> > > > > > >> properties >> > > > > > >> > > >> > > > RemoteMetadataCache relies on. >> > > > > > >> > > >> > > > >> > > > > > >> > > >> > > > Q3 It should be fine for ConsumerTask to ignore >> > > > tombstone >> > > > > > >> > records >> > > > > > >> > > >> (null >> > > > > > >> > > >> > > > values) and no-op. >> > > > > > >> > > >> > > > >> > > > > > >> > > >> > > > Q4 Although TBRLMM is a sample RLMM >> implementation, >> > > > it’s >> > > > > > >> > currently >> > > > > > >> > > >> the >> > > > > > >> > > >> > > only >> > > > > > >> > > >> > > > OSS option and is widely used. The new >> > > > > > >> > > >> __remote_log_metadata_compacted >> > > > > > >> > > >> > > > topic offers clear operational benefits in that >> > > > context. >> > > > > We >> > > > > > >> can >> > > > > > >> > > also >> > > > > > >> > > >> > > > provide a configuration to let users choose >> whether >> > > > they >> > > > > > >> want to >> > > > > > >> > > >> keep >> > > > > > >> > > >> > the >> > > > > > >> > > >> > > > audit topic (__remote_log_metadata) in their >> > cluster. >> > > > > > >> > > >> > > > >> > > > > > >> > > >> > > > Q4a Enabling compaction on __remote_log_metadata >> > > alone >> > > > > may >> > > > > > >> not >> > > > > > >> > > fully >> > > > > > >> > > >> > > > address the unbounded growth, since we also >> need to >> > > > emit >> > > > > > >> > > tombstones >> > > > > > >> > > >> for >> > > > > > >> > > >> > > > expired keys to delete them. Deferring >> compaction >> > and >> > > > > > >> > tombstoning >> > > > > > >> > > to >> > > > > > >> > > >> > user >> > > > > > >> > > >> > > > configuration could make the code flow >> complicated, >> > > > also >> > > > > > add >> > > > > > >> > > >> > operational >> > > > > > >> > > >> > > > complexity and make outcomes less predictable. >> The >> > > > > proposal >> > > > > > >> aims >> > > > > > >> > > to >> > > > > > >> > > >> > > provide >> > > > > > >> > > >> > > > a consistent experience by defining >> deterministic >> > > keys >> > > > > and >> > > > > > >> > > emitting >> > > > > > >> > > >> > > > tombstones as part of the broker’s >> > responsibilities, >> > > > > while >> > > > > > >> still >> > > > > > >> > > >> > allowing >> > > > > > >> > > >> > > > users to opt out of the audit topic if they >> prefer. >> > > > But I >> > > > > > am >> > > > > > >> > open >> > > > > > >> > > to >> > > > > > >> > > >> > more >> > > > > > >> > > >> > > > discussion if there is any concrete need I don't >> > > > foresee. >> > > > > > >> > > >> > > > >> > > > > > >> > > >> > > > >> > > > > > >> > > >> > > > Thanks, >> > > > > > >> > > >> > > > >> > > > > > >> > > >> > > > Lijun Tong >> > > > > > >> > > >> > > > >> > > > > > >> > > >> > > > Kamal Chandraprakash < >> > [email protected] >> > > > >> > > > > > >> > > 于2026年1月6日周二 >> > > > > > >> > > >> > > > 01:01写道: >> > > > > > >> > > >> > > > >> > > > > > >> > > >> > > > > Hi Lijun, >> > > > > > >> > > >> > > > > >> > > > > > >> > > >> > > > > Thanks for the KIP! Went over the first pass. >> > > > > > >> > > >> > > > > >> > > > > > >> > > >> > > > > Few Questions: >> > > > > > >> > > >> > > > > >> > > > > > >> > > >> > > > > 1. Are we going to maintain the same >> > > > > > >> > > >> > RemoteLogMetadataTopicPartitioner >> > > > > > >> > > >> > > > > < >> > > > > > >> > > >> > > > > >> > > > > > >> > > >> > > > >> > > > > > >> > > >> > > >> > > > > > >> > > >> > >> > > > > > >> > > >> >> > > > > > >> > > >> > > > > > >> > >> > > > > > >> >> > > > > > >> > > > > >> > > > >> > > >> > >> https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/RemoteLogMetadataTopicPartitioner.java >> > > > > > >> > > >> > > > > > >> > > > > > >> > > >> > > > > for both the topics? It is not clear in the >> KIP, >> > > > could >> > > > > > you >> > > > > > >> > > clarify >> > > > > > >> > > >> > it? >> > > > > > >> > > >> > > > > 2. Can the key be changed to SegmentId:State >> > > instead >> > > > of >> > > > > > >> > > >> > > > > TopicId:Partition:EndOffset:BrokerLeaderEpoch >> if >> > > the >> > > > > same >> > > > > > >> > > >> partitioner >> > > > > > >> > > >> > > is >> > > > > > >> > > >> > > > > used? It is good to maintain all the segment >> > states >> > > > > for a >> > > > > > >> > > >> > > > > user-topic-partition in the same metadata >> > > partition. >> > > > > > >> > > >> > > > > 3. Should we have to handle the records with >> null >> > > > value >> > > > > > >> > > >> (tombstone) >> > > > > > >> > > >> > in >> > > > > > >> > > >> > > > the >> > > > > > >> > > >> > > > > ConsumerTask >> > > > > > >> > > >> > > > > < >> > > > > > >> > > >> > > > > >> > > > > > >> > > >> > > > >> > > > > > >> > > >> > > >> > > > > > >> > > >> > >> > > > > > >> > > >> >> > > > > > >> > > >> > > > > > >> > >> > > > > > >> >> > > > > > >> > > > > >> > > > >> > > >> > >> https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/ConsumerTask.java?L166 >> > > > > > >> > > >> > > > > > >> > > > > > >> > > >> > > > > ? >> > > > > > >> > > >> > > > > 4. TBRLMM >> > > > > > >> > > >> > > > > < >> > > > > > >> > > >> > > > > >> > > > > > >> > > >> > > > >> > > > > > >> > > >> > > >> > > > > > >> > > >> > >> > > > > > >> > > >> >> > > > > > >> > > >> > > > > > >> > >> > > > > > >> >> > > > > > >> > > > > >> > > > >> > > >> > >> https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/TopicBasedRemoteLogMetadataManager.java >> > > > > > >> > > >> > > > > > >> > > > > > >> > > >> > > > > is a sample plugin implementation of RLMM. Not >> > sure >> > > > > > whether >> > > > > > >> > the >> > > > > > >> > > >> > > community >> > > > > > >> > > >> > > > > will agree to add one more internal topic for >> > this >> > > > > plugin >> > > > > > >> > impl. >> > > > > > >> > > >> > > > > 4a. Can we modify the new messages to the >> > > > > > >> > __remote_log_metadata >> > > > > > >> > > >> topic >> > > > > > >> > > >> > > to >> > > > > > >> > > >> > > > > contain the key and leave it to the user to >> > enable >> > > > > > >> compaction >> > > > > > >> > > for >> > > > > > >> > > >> > this >> > > > > > >> > > >> > > > > topic if they need? >> > > > > > >> > > >> > > > > >> > > > > > >> > > >> > > > > Thanks, >> > > > > > >> > > >> > > > > Kamal >> > > > > > >> > > >> > > > > >> > > > > > >> > > >> > > > > On Tue, Jan 6, 2026 at 7:35 AM Lijun Tong < >> > > > > > >> > > >> [email protected]> >> > > > > > >> > > >> > > > wrote: >> > > > > > >> > > >> > > > > >> > > > > > >> > > >> > > > > > Hey Henry, >> > > > > > >> > > >> > > > > > >> > > > > > >> > > >> > > > > > Thank you for your time and response! I >> really >> > > like >> > > > > > your >> > > > > > >> > > >> KIP-1248 >> > > > > > >> > > >> > > about >> > > > > > >> > > >> > > > > > offloading the consumption of remote log >> away >> > > from >> > > > > the >> > > > > > >> > broker, >> > > > > > >> > > >> and >> > > > > > >> > > >> > I >> > > > > > >> > > >> > > > > think >> > > > > > >> > > >> > > > > > with that change, the topic that enables the >> > > tiered >> > > > > > >> storage >> > > > > > >> > > can >> > > > > > >> > > >> > also >> > > > > > >> > > >> > > > have >> > > > > > >> > > >> > > > > > longer retention configurations and would >> > benefit >> > > > > from >> > > > > > >> this >> > > > > > >> > > KIP >> > > > > > >> > > >> > too. >> > > > > > >> > > >> > > > > > >> > > > > > >> > > >> > > > > > Some suggestions: In your example >> scenarios, it >> > > > would >> > > > > > >> also >> > > > > > >> > be >> > > > > > >> > > >> good >> > > > > > >> > > >> > to >> > > > > > >> > > >> > > > add >> > > > > > >> > > >> > > > > > > an example of remote log segment deletion >> > > > triggered >> > > > > > by >> > > > > > >> > > >> retention >> > > > > > >> > > >> > > > policy >> > > > > > >> > > >> > > > > > > which will trigger generation of tombstone >> > > event >> > > > > into >> > > > > > >> > > metadata >> > > > > > >> > > >> > > topic >> > > > > > >> > > >> > > > > and >> > > > > > >> > > >> > > > > > > trigger log compaction/deletion 24 hour >> > later, >> > > I >> > > > > > think >> > > > > > >> > this >> > > > > > >> > > is >> > > > > > >> > > >> > the >> > > > > > >> > > >> > > > key >> > > > > > >> > > >> > > > > > > event to cap the metadata topic size. >> > > > > > >> > > >> > > > > > >> > > > > > >> > > >> > > > > > >> > > > > > >> > > >> > > > > > Regarding to this suggestion, I am not sure >> > > whether >> > > > > > >> > Scenario 4 >> > > > > > >> > > >> > > > > > < >> > > > > > >> > > >> > > > > > >> > > > > > >> > > >> > > > > >> > > > > > >> > > >> > > > >> > > > > > >> > > >> > > >> > > > > > >> > > >> > >> > > > > > >> > > >> >> > > > > > >> > > >> > > > > > >> > >> > > > > > >> >> > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406618613#KIP1266:BoundingTheNumberOfRemoteLogMetadataMessagesviaCompactedTopic-Scenario4:SegmentDeletion >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > > > > has >> > > > > > >> > > >> > > > > > covered it. I can add more rows in the >> Timeline >> > > > Table >> > > > > > >> like >> > > > > > >> > > >> > T5+24hour >> > > > > > >> > > >> > > to >> > > > > > >> > > >> > > > > > indicate the messages are gone by then to >> > > > explicitly >> > > > > > show >> > > > > > >> > that >> > > > > > >> > > >> > > messages >> > > > > > >> > > >> > > > > are >> > > > > > >> > > >> > > > > > deleted, thus the number of messages are >> capped >> > > in >> > > > > the >> > > > > > >> > topic. >> > > > > > >> > > >> > > > > > >> > > > > > >> > > >> > > > > > Regarding whether the topic >> > __remote_log_metadata >> > > > is >> > > > > > >> still >> > > > > > >> > > >> > > necessary, I >> > > > > > >> > > >> > > > > am >> > > > > > >> > > >> > > > > > inclined to continue to have this topic at >> > least >> > > > for >> > > > > > >> > debugging >> > > > > > >> > > >> > > purposes >> > > > > > >> > > >> > > > > so >> > > > > > >> > > >> > > > > > we can build confidence about the compacted >> > topic >> > > > > > >> change, we >> > > > > > >> > > can >> > > > > > >> > > >> > > > > > always choose to remove this topic in the >> > future >> > > > once >> > > > > > we >> > > > > > >> all >> > > > > > >> > > >> agree >> > > > > > >> > > >> > it >> > > > > > >> > > >> > > > > > provides limited value for the users. >> > > > > > >> > > >> > > > > > >> > > > > > >> > > >> > > > > > Thanks, >> > > > > > >> > > >> > > > > > Lijun Tong >> > > > > > >> > > >> > > > > > >> > > > > > >> > > >> > > > > > >> > > > > > >> > > >> > > > > > Henry Haiying Cai via dev < >> > [email protected]> >> > > > > > >> > 于2026年1月5日周一 >> > > > > > >> > > >> > > 16:19写道: >> > > > > > >> > > >> > > > > > >> > > > > > >> > > >> > > > > > > Lijun, >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > > > > > Thanks for the proposal and I liked your >> idea >> > > of >> > > > > > using >> > > > > > >> a >> > > > > > >> > > >> > compacted >> > > > > > >> > > >> > > > > topic >> > > > > > >> > > >> > > > > > > for tiered storage metadata topic. >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > > > > > In our setup, we have set a shorter >> retention >> > > (3 >> > > > > > days) >> > > > > > >> for >> > > > > > >> > > the >> > > > > > >> > > >> > > tiered >> > > > > > >> > > >> > > > > > > storage metadata topic to control the size >> > > > growth. >> > > > > > We >> > > > > > >> can >> > > > > > >> > > do >> > > > > > >> > > >> > that >> > > > > > >> > > >> > > > > since >> > > > > > >> > > >> > > > > > we >> > > > > > >> > > >> > > > > > > control all topic's retention policy in >> our >> > > > > clusters >> > > > > > >> and >> > > > > > >> > we >> > > > > > >> > > >> set a >> > > > > > >> > > >> > > > > uniform >> > > > > > >> > > >> > > > > > > retention.policy for all our tiered >> storage >> > > > topics. >> > > > > > I >> > > > > > >> can >> > > > > > >> > > see >> > > > > > >> > > >> > > other >> > > > > > >> > > >> > > > > > > users/companies will not be able to >> enforce >> > > that >> > > > > > >> retention >> > > > > > >> > > >> policy >> > > > > > >> > > >> > > to >> > > > > > >> > > >> > > > > all >> > > > > > >> > > >> > > > > > > tiered storage topics. >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > > > > > Some suggestions: In your example >> scenarios, >> > it >> > > > > would >> > > > > > >> also >> > > > > > >> > > be >> > > > > > >> > > >> > good >> > > > > > >> > > >> > > to >> > > > > > >> > > >> > > > > add >> > > > > > >> > > >> > > > > > > an example of remote log segment deletion >> > > > triggered >> > > > > > by >> > > > > > >> > > >> retention >> > > > > > >> > > >> > > > policy >> > > > > > >> > > >> > > > > > > which will trigger generation of tombstone >> > > event >> > > > > into >> > > > > > >> > > metadata >> > > > > > >> > > >> > > topic >> > > > > > >> > > >> > > > > and >> > > > > > >> > > >> > > > > > > trigger log compaction/deletion 24 hour >> > later, >> > > I >> > > > > > think >> > > > > > >> > this >> > > > > > >> > > is >> > > > > > >> > > >> > the >> > > > > > >> > > >> > > > key >> > > > > > >> > > >> > > > > > > event to cap the metadata topic size. >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > > > > > For the original unbounded >> > remote_log_metadata >> > > > > topic, >> > > > > > >> I am >> > > > > > >> > > not >> > > > > > >> > > >> > sure >> > > > > > >> > > >> > > > > > > whether we still need it or not. If it is >> > left >> > > > > only >> > > > > > >> for >> > > > > > >> > > audit >> > > > > > >> > > >> > > trail >> > > > > > >> > > >> > > > > > > purpose, people can set up a data >> ingestion >> > > > > pipeline >> > > > > > to >> > > > > > >> > > ingest >> > > > > > >> > > >> > the >> > > > > > >> > > >> > > > > > content >> > > > > > >> > > >> > > > > > > of metadata topic into a separate storage >> > > > location. >> > > > > > I >> > > > > > >> > think >> > > > > > >> > > >> we >> > > > > > >> > > >> > can >> > > > > > >> > > >> > > > > have >> > > > > > >> > > >> > > > > > a >> > > > > > >> > > >> > > > > > > flag to have only one metadata topic (the >> > > > compacted >> > > > > > >> > > version). >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > > > > > On Monday, January 5, 2026 at 01:22:42 PM >> > PST, >> > > > > Lijun >> > > > > > >> Tong >> > > > > > >> > < >> > > > > > >> > > >> > > > > > > [email protected]> wrote: >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > > > > > Hello Kafka Community, >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > > > > > I would like to start a discussion on >> > KIP-1266, >> > > > > which >> > > > > > >> > > >> proposes to >> > > > > > >> > > >> > > add >> > > > > > >> > > >> > > > > > > another new compacted remote log metadata >> > topic >> > > > for >> > > > > > the >> > > > > > >> > > tiered >> > > > > > >> > > >> > > > storage, >> > > > > > >> > > >> > > > > > to >> > > > > > >> > > >> > > > > > > limit the number of messages that need to >> be >> > > > > iterated >> > > > > > >> to >> > > > > > >> > > build >> > > > > > >> > > >> > the >> > > > > > >> > > >> > > > > remote >> > > > > > >> > > >> > > > > > > metadata state. >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > > > > > KIP link: KIP-1266 Bounding The Number Of >> > > > > > >> > RemoteLogMetadata >> > > > > > >> > > >> > > Messages >> > > > > > >> > > >> > > > > via >> > > > > > >> > > >> > > > > > > Compacted RemoteLogMetadata Topic >> > > > > > >> > > >> > > > > > > < >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > > > > >> > > > > > >> > > >> > > > > >> > > > > > >> > > >> > > > >> > > > > > >> > > >> > > >> > > > > > >> > > >> > >> > > > > > >> > > >> >> > > > > > >> > > >> > > > > > >> > >> > > > > > >> >> > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1266%3A+Bounding+The+Number+Of+RemoteLogMetadata+Messages+via+Compacted+Topic >> > > > > > >> > > >> > > > > > > > >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > > > > > Background: >> > > > > > >> > > >> > > > > > > The current Tiered Storage implementation >> > uses >> > > a >> > > > > > >> > > >> > > > __remote_log_metadata >> > > > > > >> > > >> > > > > > > topic with infinite retention and >> > delete-based >> > > > > > cleanup >> > > > > > >> > > policy, >> > > > > > >> > > >> > > > causing >> > > > > > >> > > >> > > > > > > unbounded growth, slow broker bootstrap, >> no >> > > > > mechanism >> > > > > > >> to >> > > > > > >> > > >> clean up >> > > > > > >> > > >> > > > > expired >> > > > > > >> > > >> > > > > > > segment metadata, and inefficient >> re-reading >> > > from >> > > > > > >> offset 0 >> > > > > > >> > > >> during >> > > > > > >> > > >> > > > > > > leadership changes. >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > > > > > Proposal: >> > > > > > >> > > >> > > > > > > A dual-topic approach that introduces a >> new >> > > > > > >> > > >> > > > > > __remote_log_metadata_compacted >> > > > > > >> > > >> > > > > > > topic using log compaction with >> deterministic >> > > > > > >> offset-based >> > > > > > >> > > >> keys, >> > > > > > >> > > >> > > > while >> > > > > > >> > > >> > > > > > > preserving the existing topic for audit >> > > history; >> > > > > this >> > > > > > >> > allows >> > > > > > >> > > >> > > brokers >> > > > > > >> > > >> > > > to >> > > > > > >> > > >> > > > > > > build their metadata cache exclusively >> from >> > the >> > > > > > >> compacted >> > > > > > >> > > >> topic, >> > > > > > >> > > >> > > > > enables >> > > > > > >> > > >> > > > > > > cleanup of expired segment metadata >> through >> > > > > > tombstones, >> > > > > > >> > and >> > > > > > >> > > >> > > includes >> > > > > > >> > > >> > > > a >> > > > > > >> > > >> > > > > > > migration strategy to populate the new >> topic >> > > > during >> > > > > > >> > > >> > > > upgrade—delivering >> > > > > > >> > > >> > > > > > > bounded metadata growth and faster broker >> > > startup >> > > > > > while >> > > > > > >> > > >> > maintaining >> > > > > > >> > > >> > > > > > > backward compatibility. >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > > > > > More details are in the attached KIP link. >> > > > > > >> > > >> > > > > > > Looking forward to your thoughts. >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > > > > > Thank you for your time! >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > > > > > Best, >> > > > > > >> > > >> > > > > > > Lijun Tong >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > > > > >> > > > > > >> > > >> > > > > >> > > > > > >> > > >> > > > >> > > > > > >> > > >> > > >> > > > > > >> > > >> > >> > > > > > >> > > >> >> > > > > > >> > > > >> > > > > > >> > > >> > > > > > >> > >> > > > > > >> >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> >
