Hey team, I have finished the code change for this KIP, and PR is here https://github.com/apache/kafka/pull/22459. There are 2 commits in this PR, the first commit includes all the source code change, the second commit includes all the test changes. I have adopted some advices from this email chain and have also updated the KIP and code accordingly.
Would love to hear your thoughts. Thanks, Lijun Tong jian fu <[email protected]> 于2026年4月11日周六 18:38写道: > Hi Kamal and Lijun Tong > > Maybe another approach (Not good solution) is when we sure about the work > is complete. we send all the keys' to null to delete all the data. > It mean the keys are: > clusterid:topic:partition:offset: COPY_SEGMENT_FINISHED > clusterid:topic:partition:offset: DELETE_SEGMENT_STARTED > clusterid:topic:partition:offset: DELETE_SEGMENT_FINISHED > ---------------- > > This solution won't change the old payload in the message. Thus the > solution is somehow strange. > Or carry all complete message so that keep the latest maybe can work. > > BTW: There are four type message need to take care: > RemoteLogSegmentMetadataRecord > RemoteLogSegmentMetadataUpdateRecord > RemotePartitionDeleteMetadataRecord > RemoteLogSegmentMetadataSnapshotRecord > > Regards > Jian > > > Lijun Tong <[email protected]> 于2026年4月7日周二 08:35写道: > > > Hi Kamal, > > > > I now see what you mean regarding the DELETE_SEGMENT_STARTED handling. > > > > I was thinking that the broker's logic that currently ignores log > segment > > messages when COPY_SEGMENT_STARTED doesn't exist might need to be updated > > for the > > new message format. Since the new message contains useful information > > like endOffset, topicID, and partition, it could provide sufficient > context > > to help > > with the retry of deletion operations. This suggests the broker might > not > > need to ignore these messages solely because of the absence of > > COPY_SEGMENT_STARTED. > > > > With the new key-based format, we would have enough metadata in the > > DELETE_SEGMENT_STARTED event itself to: > > - Identify the remote segment location > > - Track and retry the deletion process > > - Prevent orphaned segments in remote storage > > > > I believe this means that with the new keyed message format, we could > > avoid orphaned remote segments that might occur when COPY_SEGMENT_STARTED > > is cleaned > > by the retention policy. The DELETE_SEGMENT_STARTED event would carry > all > > the necessary information to complete the deletion, regardless of whether > > the > > base metadata still exists. > > > > For the old format messages, the same mechanism (ignoring when > > COPY_SEGMENT_STARTED is missing) could still apply, as those messages > lack > > the necessary > > information for independent processing. > > > > I'd appreciate your thoughts on this approach. > > > > Best, > > Lijun Tong > > > > > > Kamal Chandraprakash <[email protected]> 于2026年4月5日周日 > > 17:37写道: > > > > > Hi Lijun, > > > > > > Yes, we can get on a call to close out on this. My concern is that if > the > > > same key is maintained for a given segment metadata records, > > > then the newer messages (COPY_FINSIHED, DELETE_STARTED) might > > > override/compact the previous COPY_STARTED events. > > > This is not about the old / new format of messages. Assume that all the > > > messages in the topic are in the new format and contain keys. > > > > > > From > > > > > > https://docs.confluent.io/kafka/design/log_compaction.html#topic-compaction > > > : > > > > > > Topic compaction is a mechanism that allows you to retain the latest > > value > > > for each message key in a topic, while discarding older values. It > > > guarantees that the latest value for each message key is always > retained > > > within the log of data contained in that topic, making it ideal for use > > > cases such as restoring state after system failure or reloading caches > > > after application restarts. > > > > > > Thanks, > > > Kamal > > > > > > On Sun, Apr 5, 2026 at 11:37 PM Lijun Tong <[email protected]> > > > wrote: > > > > > > > Hi Kamal, > > > > > > > > Thanks for raising this. > > > > > > > > Currently, only the existing version of > > > > RemoteLogSegmentMetadataUpdateRecord > > > > does not include those fields. We rely on the time-based retention > > policy > > > > for cleanup, and this does not impact the ability to rebuild the > > > > RemoteLogMetadataCache. > > > > > > > > The cache reconstruction should still work correctly because it > depends > > > on > > > > the value, and we have not removed any fields from the value. > > > > > > > > Regarding the scenario where there are 10 remote segments and the > > > > __remote_log_metadata topic contains only COPY_SEGMENT_FINISHED > events: > > > the > > > > COPY_SEGMENT_STARTED events will not be compacted in this case, since > > > > messages with a null key are not subject to compaction. > > > > > > > > Once older-format messages are cleaned up by the time-based retention > > > > policy and compaction is enabled, records with the same key will be > > > > compacted asynchronously and correctly. Given this, I don’t believe > we > > > need > > > > to introduce a separate key for COPY_SEGMENT_STARTED events. > > > > > > > > Happy to jump on a call if it’s easier to discuss further. > > > > > > > > Best, > > > > > > > > Lijun Tong > > > > > > > > Lijun Tong <[email protected]> 于2026年4月5日周日 10:56写道: > > > > > > > > > Hey Kamal, > > > > > > > > > > I am not very clear on what's the question you mentioned above, I > am > > > > > happy to jump to a call to discuss further, and I lived in PST time > > > zone. > > > > > Maybe we can meet online through google meet? > > > > > > > > > > Thanks, > > > > > Lijun Tong > > > > > > > > > > Kamal Chandraprakash <[email protected]> 于2026年4月1日周三 > > > > > 02:38写道: > > > > > > > > > >> Hi Lijun, > > > > >> > > > > >> Thanks for the update! I'm still not clear on this. > > > > >> > > > > >> The RemoteLogSegmentMetadataUpdateRecord does not contain the > below > > > > fields > > > > >> compared to RemoteLogSegmentMetadataRecord: > > > > >> > > > > >> - startOffset > > > > >> - endOffset (will be added as a tagged field) > > > > >> - MaxTimestampMs > > > > >> - SegmentLeaderEpochs > > > > >> - SegmentSizeInBytes and > > > > >> - TxnIndexEmpty > > > > >> > > > > >> When a broker gets restarted, will it be able to rebuild > > > > >> the RemoteLogMetadataCache? Assume that there are 10 remote > > > > >> segments and the __remote_log_metadata topic contains only the > > > > >> COPY_SEGMENT_FINISHED events; the COPY_SEGMENT_STARTED event > > > > >> gets compacted as the key is the same. > > > > >> > > > > >> Do we need a separate key for the COPY_SEGMENT_STARTED event and > > > another > > > > >> key for the remaining states? > > > > >> > > > > >> Current key format: TopicIdPartition:EndOffset:BrokerLeaderEpoch > > > > >> Proposed key format: > > TopicIdPartition:EndOffset:BrokerLeaderEpoch:x/y > > > > >> where > > > > >> x denotes a identifier for COPY_SEGMENT_STARTED and y denote for > all > > > the > > > > >> other events. > > > > >> > > > > >> Thanks, > > > > >> Kamal > > > > >> > > > > >> > > > > >> > > > > >> On Tue, Mar 31, 2026 at 8:23 AM Lijun Tong < > [email protected] > > > > > > > >> wrote: > > > > >> > > > > >> > Hi Kamal, > > > > >> > > > > > >> > The scenario you described only happened with the old version > > > > >> > RemoteLogSegmentUpdateMetadata message, since the endOffset will > > be > > > > >> added > > > > >> > in the new RemoteLogSegmentUpdateMetadata schema. For the > existing > > > > >> > RemoteLogSegmentUpdateMetadata messages, we rely on the time > based > > > > >> > retention policy to clean up. Does that make sense? > > > > >> > > > > > >> > Best, > > > > >> > Lijun Tong > > > > >> > > > > > >> > Kamal Chandraprakash <[email protected]> > > 于2026年3月30日周一 > > > > >> > 18:14写道: > > > > >> > > > > > >> > > Hi Lijun, > > > > >> > > > > > > >> > > RemoteLogSegmentUpdateMetadata event does not have all the > > > > >> > > fields/attributes similar to RemoteLogSegmentMetadata event. > > > > >> > > > > > > >> > > Assume that after compaction, for a segment, we have only > > > > >> > > COPY_SEGMENT_FINISHED records. How do you plan to retrieve the > > > other > > > > >> > fields > > > > >> > > after broker restart? > > > > >> > > > > > > >> > > Thanks, > > > > >> > > Kamal > > > > >> > > > > > > >> > > On Mon, Mar 30, 2026, 23:22 Lijun Tong < > [email protected] > > > > > > > >> wrote: > > > > >> > > > > > > >> > > > Hi Kamal, > > > > >> > > > > > > > >> > > > Thanks for taking another look at the KIP. > > > > >> > > > 1. I have removed the left-over line about using another new > > > topic > > > > >> from > > > > >> > > the > > > > >> > > > KIP. > > > > >> > > > 2. > > > > >> > > > > > > > >> > > > > 2. Assume that the topic is enabled with compaction and > only > > > one > > > > >> > event > > > > >> > > is > > > > >> > > > > maintained per segment. If there is a transient error in > the > > > > >> remote > > > > >> > log > > > > >> > > > > deletion, > > > > >> > > > > then the COPY_SEGMENT started / finished events might > be > > > > >> > compacted > > > > >> > > by > > > > >> > > > > the DELETE_SEGMENT_STARTED events. If the broker is > > restarted > > > > >> during > > > > >> > > > > this time, will there be dangling remote log segments? > > > > >> Currently, > > > > >> > > > > during restart, the broker discards the events if it does > > not > > > > see > > > > >> the > > > > >> > > > > COPY_SEGMENT_STARTED events. > > > > >> > > > > > > > >> > > > > > > > >> > > > I am glad you asked this question, I didn't mention this > part > > in > > > > my > > > > >> > > current > > > > >> > > > design to avoid distractions from the main design, but I > plan > > to > > > > add > > > > >> > > > another background thread to clean up all the stale messages > > by > > > > >> > comparing > > > > >> > > > the message's endOffset with the topic partition's log start > > > > >> offset. I > > > > >> > > > believe this would help remove all the dangling messages. > > > > >> > > > > > > > >> > > > Thanks, > > > > >> > > > Lijun TOng > > > > >> > > > > > > > >> > > > Kamal Chandraprakash <[email protected]> > > > > 于2026年3月29日周日 > > > > >> > > > 22:48写道: > > > > >> > > > > > > > >> > > > > Hi Lijun, > > > > >> > > > > > > > > >> > > > > Sorry for the late reply. Went over the KIP again. Overall > > > LGTM. > > > > >> Few > > > > >> > > > > points: > > > > >> > > > > > > > > >> > > > > > This KIP aims to solve this issue through introducing > > > another > > > > >> > > compacted > > > > >> > > > > topic for the brokers to bootstrap the state from > > > > >> > > > > > > > > >> > > > > 1. Shall we update the motivation section to mention that > > > > another > > > > >> > topic > > > > >> > > > is > > > > >> > > > > not introduced? > > > > >> > > > > 2. Assume that the topic is enabled with compaction and > only > > > one > > > > >> > event > > > > >> > > is > > > > >> > > > > maintained per segment. If there is a transient error in > the > > > > >> remote > > > > >> > log > > > > >> > > > > deletion, > > > > >> > > > > then the COPY_SEGMENT started / finished events might > be > > > > >> > compacted > > > > >> > > by > > > > >> > > > > the DELETE_SEGMENT_STARTED events. If the broker is > > restarted > > > > >> during > > > > >> > > > > this time, will there be dangling remote log segments? > > > > >> Currently, > > > > >> > > > > during restart, the broker discards the events if it does > > not > > > > see > > > > >> the > > > > >> > > > > COPY_SEGMENT_STARTED events. > > > > >> > > > > > > > > >> > > > > Thanks, > > > > >> > > > > Kamal > > > > >> > > > > > > > > >> > > > > On Thu, Mar 26, 2026 at 5:08 AM Lijun Tong < > > > > >> [email protected]> > > > > >> > > > > wrote: > > > > >> > > > > > > > > >> > > > > > Hi, > > > > >> > > > > > > > > > >> > > > > > I have started a Vote thread for this KIP, considering > all > > > > >> > questions > > > > >> > > > > raised > > > > >> > > > > > so far have been answered. I am happy to continue the > > > > >> discussion if > > > > >> > > > > needed, > > > > >> > > > > > otherwise, this is a friendly reminder on the vote for > > this > > > > KIP. > > > > >> > > > > > > > > > >> > > > > > Thanks, > > > > >> > > > > > Lijun Tong > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > Lijun Tong <[email protected]> 于2026年1月19日周一 > > 17:59写道: > > > > >> > > > > > > > > > >> > > > > > > Hey Kamal, > > > > >> > > > > > > > > > > >> > > > > > > Thanks for raising these questions. Here are my > > responses > > > to > > > > >> your > > > > >> > > > > > > questions: > > > > >> > > > > > > Q1 and Q2: > > > > >> > > > > > > I think both questions boil down to how to release > this > > > new > > > > >> > > feature, > > > > >> > > > > both > > > > >> > > > > > > questions are valid concerns. The solution I have in > > mind > > > is > > > > >> this > > > > >> > > > > feature > > > > >> > > > > > > is *gated by the metadata version*. The new tombstone > > > > >> semantics > > > > >> > and > > > > >> > > > the > > > > >> > > > > > > additional fields (for example in > > > > >> RemoteLogSegmentUpdateRecord) > > > > >> > are > > > > >> > > > > only > > > > >> > > > > > > enabled once the cluster metadata version is upgraded > to > > > the > > > > >> > > version > > > > >> > > > > that > > > > >> > > > > > > introduces this feature. As long as the cluster > metadata > > > > >> version > > > > >> > is > > > > >> > > > not > > > > >> > > > > > > bumped, the system will not produce tombstone records. > > > > >> Therefore, > > > > >> > > > > during > > > > >> > > > > > > rolling upgrades (mixed 4.2/4.3 brokers), the feature > > > > remains > > > > >> > > > > effectively > > > > >> > > > > > > disabled. Tombstones will only start being produced > > after > > > > the > > > > >> > > > metadata > > > > >> > > > > > > version is upgraded, at which point all brokers are > > > already > > > > >> > > required > > > > >> > > > to > > > > >> > > > > > > support the new behavior. > > > > >> > > > > > > > > > > >> > > > > > > Since Kafka does not support metadata version > downgrades > > > at > > > > >> the > > > > >> > > > moment, > > > > >> > > > > > > once a metadata version that supports this feature is > > > > >> enabled, it > > > > >> > > > > cannot > > > > >> > > > > > be > > > > >> > > > > > > downgraded to a version that does not support it. I > will > > > add > > > > >> > these > > > > >> > > > > > details > > > > >> > > > > > > to the KIP later. > > > > >> > > > > > > Q3. This is an *editing mistake* in the KIP. Thanks > for > > > > >> pointing > > > > >> > it > > > > >> > > > > out — > > > > >> > > > > > > the enum value has already been corrected in the > latest > > > > >> revision > > > > >> > to > > > > >> > > > > > remove > > > > >> > > > > > > the unused placeholder and keep the state values > > > contiguous > > > > >> and > > > > >> > > > > > consistent. > > > > >> > > > > > > Q4. I don't foresee the quota mechanism will interfere > > > with > > > > >> the > > > > >> > > state > > > > >> > > > > > > transition in any way so far, let me know if any > concern > > > > hits > > > > >> > you. > > > > >> > > > > > > > > > > >> > > > > > > Thanks, > > > > >> > > > > > > Lijun > > > > >> > > > > > > > > > > >> > > > > > > Kamal Chandraprakash <[email protected]> > > > > >> > > 于2026年1月18日周日 > > > > >> > > > > > > 00:40写道: > > > > >> > > > > > > > > > > >> > > > > > >> Hi Lijun, > > > > >> > > > > > >> > > > > >> > > > > > >> Thanks for updating the KIP! > > > > >> > > > > > >> > > > > >> > > > > > >> The updated migration plan looks clean to me. Few > > > > questions: > > > > >> > > > > > >> > > > > >> > > > > > >> 1. The ConsumerTask in 4.2 Kafka build does not > handle > > > the > > > > >> > > tombstone > > > > >> > > > > > >> records. Should the tombstone records be sent only > when > > > all > > > > >> the > > > > >> > > > > brokers > > > > >> > > > > > >> are > > > > >> > > > > > >> upgraded to 4.3 version? > > > > >> > > > > > >> > > > > >> > > > > > >> 2. Once all the brokers are upgraded and the > > > > >> > __remote_log_metadata > > > > >> > > > > topic > > > > >> > > > > > >> cleanup policy changed to compact. Then, downgrading > > the > > > > >> brokers > > > > >> > > is > > > > >> > > > > not > > > > >> > > > > > >> allowed as the records without key will throw an > error > > > > while > > > > >> > > > producing > > > > >> > > > > > the > > > > >> > > > > > >> compacted topic. Shall we mention this in the > > > compatibility > > > > >> > > section? > > > > >> > > > > > >> > > > > >> > > > > > >> 3. In the RemoteLogSegmentState Enum, why is the > value > > 1 > > > > >> marked > > > > >> > as > > > > >> > > > > > unused? > > > > >> > > > > > >> > > > > >> > > > > > >> 4. Regarding the key > > > > >> > > (TopicIdPartition:EndOffset:BrokerLeaderEpoch), > > > > >> > > > > we > > > > >> > > > > > >> may > > > > >> > > > > > >> have to check for scenarios where there is segment > lag > > > due > > > > to > > > > >> > > remote > > > > >> > > > > log > > > > >> > > > > > >> write quota. Will the state transition work > correctly? > > > Will > > > > >> come > > > > >> > > > back > > > > >> > > > > to > > > > >> > > > > > >> this later. > > > > >> > > > > > >> > > > > >> > > > > > >> Thanks, > > > > >> > > > > > >> Kamal > > > > >> > > > > > >> > > > > >> > > > > > >> On Fri, Jan 16, 2026 at 4:50 AM jian fu < > > > > >> [email protected]> > > > > >> > > > wrote: > > > > >> > > > > > >> > > > > >> > > > > > >> > Hi Lijun and Kamal > > > > >> > > > > > >> > I also think we don't need to keep delJIanpolicy in > > > final > > > > >> > > > config,if > > > > >> > > > > > >> so,we > > > > >> > > > > > >> > should always keep remembering all of our topic > > > retention > > > > >> time > > > > >> > > > must > > > > >> > > > > > less > > > > >> > > > > > >> > than the value,right?It is one protect with risk > > > > involved. > > > > >> > > > > > >> > Regards > > > > >> > > > > > >> > JIan > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > Lijun Tong <[email protected]>于2026年1月16日 > > > > 周五06:45写道: > > > > >> > > > > > >> > > > > > >> > > > > > >> > > Hey Kamal, > > > > >> > > > > > >> > > > > > > >> > > > > > >> > > Some additional points about the Q4, > > > > >> > > > > > >> > > > > > > >> > > > > > >> > > > The user can decide when to change their > internal > > > > topic > > > > >> > > > cleanup > > > > >> > > > > > >> policy > > > > >> > > > > > >> > to > > > > >> > > > > > >> > > > compact. If someone retains > > > > >> > > > > > >> > > > the data in the remote storage for 3 months, > then > > > > they > > > > >> can > > > > >> > > > > migrate > > > > >> > > > > > >> to > > > > >> > > > > > >> > the > > > > >> > > > > > >> > > > compacted topic after 3 months > > > > >> > > > > > >> > > > post rolling out this change. And, update their > > > > cleanup > > > > >> > > policy > > > > >> > > > > to > > > > >> > > > > > >> > > [compact, > > > > >> > > > > > >> > > > delete]. > > > > >> > > > > > >> > > > > > > >> > > > > > >> > > > > > > >> > > > > > >> > > I don't think it's a good idea to keep delete in > > the > > > > >> final > > > > >> > > > cleanup > > > > >> > > > > > >> policy > > > > >> > > > > > >> > > for the topic `__remote_log_metadata`, as this > > still > > > > >> > requires > > > > >> > > > the > > > > >> > > > > > >> user to > > > > >> > > > > > >> > > keep track of the max retention hours of topics > > that > > > > have > > > > >> > > remote > > > > >> > > > > > >> storage > > > > >> > > > > > >> > > enabled, and it's operational burden. It's also > > hard > > > to > > > > >> > reason > > > > >> > > > > about > > > > >> > > > > > >> what > > > > >> > > > > > >> > > will happen if the user configures the wrong > > > > >> retention.ms. > > > > >> > I > > > > >> > > > hope > > > > >> > > > > > >> this > > > > >> > > > > > >> > > makes sense. > > > > >> > > > > > >> > > > > > > >> > > > > > >> > > > > > > >> > > > > > >> > > Thanks, > > > > >> > > > > > >> > > Lijun Tong > > > > >> > > > > > >> > > > > > > >> > > > > > >> > > Lijun Tong <[email protected]> > 于2026年1月15日周四 > > > > >> 11:43写道: > > > > >> > > > > > >> > > > > > > >> > > > > > >> > > > Hey Kamal, > > > > >> > > > > > >> > > > > > > > >> > > > > > >> > > > Thanks for your reply! I am glad we are on the > > same > > > > >> page > > > > >> > > with > > > > >> > > > > > making > > > > >> > > > > > >> > the > > > > >> > > > > > >> > > > __remote_log_metadata topic compacted optional > > for > > > > the > > > > >> > user > > > > >> > > > > now, I > > > > >> > > > > > >> will > > > > >> > > > > > >> > > > update the KIP with this change. > > > > >> > > > > > >> > > > > > > > >> > > > > > >> > > > For the Q2: > > > > >> > > > > > >> > > > With the key designed as > > > > >> > > > > > >> TopicId:Partition:EndOffset:BrokerLeaderEpoch, > > > > >> > > > > > >> > > > even the same broker retries the upload > multiple > > > > times > > > > >> for > > > > >> > > the > > > > >> > > > > > same > > > > >> > > > > > >> log > > > > >> > > > > > >> > > > segment, the latest retry attempt with the > latest > > > > >> segment > > > > >> > > UUID > > > > >> > > > > > will > > > > >> > > > > > >> > > > overwrite the previous attempts' value since > they > > > > share > > > > >> > the > > > > >> > > > same > > > > >> > > > > > >> key, > > > > >> > > > > > >> > so > > > > >> > > > > > >> > > we > > > > >> > > > > > >> > > > don't need to explicitly track the failed > upload > > > > >> metadata, > > > > >> > > > > because > > > > >> > > > > > >> it's > > > > >> > > > > > >> > > > gone already by the later attempt. That's my > > > > >> understanding > > > > >> > > > about > > > > >> > > > > > the > > > > >> > > > > > >> > > > RLMCopyTask, correct me if I am wrong. > > > > >> > > > > > >> > > > > > > > >> > > > > > >> > > > Thanks, > > > > >> > > > > > >> > > > Lijun Tong > > > > >> > > > > > >> > > > > > > > >> > > > > > >> > > > Kamal Chandraprakash < > > > [email protected] > > > > > > > > > >> > > > > > 于2026年1月14日周三 > > > > >> > > > > > >> > > > 21:18写道: > > > > >> > > > > > >> > > > > > > > >> > > > > > >> > > >> Hi Lijun, > > > > >> > > > > > >> > > >> > > > > >> > > > > > >> > > >> Thanks for the reply! > > > > >> > > > > > >> > > >> > > > > >> > > > > > >> > > >> Q1: Sounds good. Could you clarify it in the > KIP > > > > that > > > > >> the > > > > >> > > > same > > > > >> > > > > > >> > > partitioner > > > > >> > > > > > >> > > >> will be used? > > > > >> > > > > > >> > > >> > > > > >> > > > > > >> > > >> Q2: With > > > > TopicId:Partition:EndOffset:BrokerLeaderEpoch > > > > >> > key, > > > > >> > > > if > > > > >> > > > > > the > > > > >> > > > > > >> > same > > > > >> > > > > > >> > > >> broker retries the upload due to intermittent > > > > >> > > > > > >> > > >> issues in object storage (or) RLMM. Then, > those > > > > failed > > > > >> > > upload > > > > >> > > > > > >> metadata > > > > >> > > > > > >> > > >> also > > > > >> > > > > > >> > > >> need to be cleared. > > > > >> > > > > > >> > > >> > > > > >> > > > > > >> > > >> Q3: We may have to skip the null value records > > in > > > > the > > > > >> > > > > > ConsumerTask. > > > > >> > > > > > >> > > >> > > > > >> > > > > > >> > > >> Q4a: The idea is to keep the cleanup policy as > > > > >> "delete" > > > > >> > and > > > > >> > > > > also > > > > >> > > > > > >> send > > > > >> > > > > > >> > > the > > > > >> > > > > > >> > > >> tombstone markers > > > > >> > > > > > >> > > >> to the existing `__remote_log_metadata` topic. > > > And, > > > > >> > handle > > > > >> > > > the > > > > >> > > > > > >> > tombstone > > > > >> > > > > > >> > > >> records in the ConsumerTask. > > > > >> > > > > > >> > > >> > > > > >> > > > > > >> > > >> The user can decide when to change their > > internal > > > > >> topic > > > > >> > > > cleanup > > > > >> > > > > > >> policy > > > > >> > > > > > >> > > to > > > > >> > > > > > >> > > >> compact. If someone retains > > > > >> > > > > > >> > > >> the data in the remote storage for 3 months, > > then > > > > they > > > > >> > can > > > > >> > > > > > migrate > > > > >> > > > > > >> to > > > > >> > > > > > >> > > the > > > > >> > > > > > >> > > >> compacted topic after 3 months > > > > >> > > > > > >> > > >> post rolling out this change. And, update > their > > > > >> cleanup > > > > >> > > > policy > > > > >> > > > > to > > > > >> > > > > > >> > > >> [compact, > > > > >> > > > > > >> > > >> delete]. > > > > >> > > > > > >> > > >> > > > > >> > > > > > >> > > >> Thanks, > > > > >> > > > > > >> > > >> Kamal > > > > >> > > > > > >> > > >> > > > > >> > > > > > >> > > >> On Thu, Jan 15, 2026 at 4:12 AM Lijun Tong < > > > > >> > > > > > >> [email protected]> > > > > >> > > > > > >> > > >> wrote: > > > > >> > > > > > >> > > >> > > > > >> > > > > > >> > > >> > Hey Jian, > > > > >> > > > > > >> > > >> > > > > > >> > > > > > >> > > >> > Thanks for your time to review this KIP. I > > > > >> appreciate > > > > >> > > that > > > > >> > > > > you > > > > >> > > > > > >> > > propose a > > > > >> > > > > > >> > > >> > simpler migration solution to onboard the > new > > > > >> feature. > > > > >> > > > > > >> > > >> > > > > > >> > > > > > >> > > >> > There are 2 points that I think can be > further > > > > >> refined > > > > >> > > on: > > > > >> > > > > > >> > > >> > > > > > >> > > > > > >> > > >> > 1). make the topic compacted optional, > > although > > > > the > > > > >> new > > > > >> > > > > feature > > > > >> > > > > > >> will > > > > >> > > > > > >> > > >> > continue to emit tombstone message for those > > > > expired > > > > >> > log > > > > >> > > > > > segments > > > > >> > > > > > >> > even > > > > >> > > > > > >> > > >> when > > > > >> > > > > > >> > > >> > the topic is still on time-based retention > > mode, > > > > so > > > > >> > once > > > > >> > > > user > > > > >> > > > > > >> > switched > > > > >> > > > > > >> > > >> to > > > > >> > > > > > >> > > >> > using the compacted topic, those expired > > > messages > > > > >> can > > > > >> > > still > > > > >> > > > > be > > > > >> > > > > > >> > deleted > > > > >> > > > > > >> > > >> > despite the topic is not retention based > > > anymore. > > > > >> > > > > > >> > > >> > 2). we need to expose some flag to the user > to > > > > >> indicate > > > > >> > > > > whether > > > > >> > > > > > >> the > > > > >> > > > > > >> > > >> topic > > > > >> > > > > > >> > > >> > can be flipped to compacted by checking > > whether > > > > all > > > > >> the > > > > >> > > old > > > > >> > > > > > >> format > > > > >> > > > > > >> > > >> > keyed-less message has expired, and allow > user > > > to > > > > >> > choose > > > > >> > > to > > > > >> > > > > > flip > > > > >> > > > > > >> to > > > > >> > > > > > >> > > >> > compacted only when the flag is true. > > > > >> > > > > > >> > > >> > > > > > >> > > > > > >> > > >> > Thanks for sharing your idea. I will update > > the > > > > KIP > > > > >> > later > > > > >> > > > > with > > > > >> > > > > > >> this > > > > >> > > > > > >> > > new > > > > >> > > > > > >> > > >> > idea. > > > > >> > > > > > >> > > >> > > > > > >> > > > > > >> > > >> > Best, > > > > >> > > > > > >> > > >> > Lijun Tong > > > > >> > > > > > >> > > >> > > > > > >> > > > > > >> > > >> > > > > > >> > > > > > >> > > >> > jian fu <[email protected]> > 于2026年1月12日周一 > > > > >> 04:55写道: > > > > >> > > > > > >> > > >> > > > > > >> > > > > > >> > > >> > > Hi Lijun Tong: > > > > >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > Thanks for your KIP which raise this > > critical > > > > >> issue. > > > > >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > what about just keep one topic instead of > > > > involve > > > > >> > > another > > > > >> > > > > > >> topic. > > > > >> > > > > > >> > > >> > > for existed topic data's migration. maybe > we > > > can > > > > >> use > > > > >> > > this > > > > >> > > > > way > > > > >> > > > > > >> to > > > > >> > > > > > >> > > solve > > > > >> > > > > > >> > > >> > the > > > > >> > > > > > >> > > >> > > issue: > > > > >> > > > > > >> > > >> > > (1) set the retention date > all of topic > > > which > > > > >> > enable > > > > >> > > > > remote > > > > >> > > > > > >> > > >> storage's > > > > >> > > > > > >> > > >> > > retention time > > > > >> > > > > > >> > > >> > > (2) deploy new kafka version with feature: > > > > which > > > > >> > send > > > > >> > > > the > > > > >> > > > > > >> message > > > > >> > > > > > >> > > >> with > > > > >> > > > > > >> > > >> > key > > > > >> > > > > > >> > > >> > > (3) wait all the message expired and new > > > message > > > > >> with > > > > >> > > key > > > > >> > > > > > >> coming > > > > >> > > > > > >> > to > > > > >> > > > > > >> > > >> the > > > > >> > > > > > >> > > >> > > topic > > > > >> > > > > > >> > > >> > > (4) convert the topic to compact > > > > >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > I don't test it. Just propose this > solution > > > > >> according > > > > >> > > to > > > > >> > > > > code > > > > >> > > > > > >> > review > > > > >> > > > > > >> > > >> > > result. just for your reference. > > > > >> > > > > > >> > > >> > > The steps maybe a little complex. but it > can > > > > >> avoiding > > > > >> > > add > > > > >> > > > > new > > > > >> > > > > > >> > topic. > > > > >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > Regards > > > > >> > > > > > >> > > >> > > Jian > > > > >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > Lijun Tong <[email protected]> > > > > 于2026年1月8日周四 > > > > >> > > > 09:17写道: > > > > >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > > Hey Kamal, > > > > >> > > > > > >> > > >> > > > > > > > >> > > > > > >> > > >> > > > > > > > >> > > > > > >> > > >> > > > Thanks for your time for the review. > > > > >> > > > > > >> > > >> > > > > > > > >> > > > > > >> > > >> > > > > > > > >> > > > > > >> > > >> > > > Here is my response to your questions: > > > > >> > > > > > >> > > >> > > > > > > > >> > > > > > >> > > >> > > > Q1 At this point, I don’t see a need to > > > change > > > > >> > > > > > >> > > >> > > > RemoteLogMetadataTopicPartitioner for > this > > > > >> design. > > > > >> > > > > Nothing > > > > >> > > > > > in > > > > >> > > > > > >> > the > > > > >> > > > > > >> > > >> > current > > > > >> > > > > > >> > > >> > > > approach appears to require a > partitioner > > > > >> change, > > > > >> > but > > > > >> > > > I’m > > > > >> > > > > > >> open > > > > >> > > > > > >> > to > > > > >> > > > > > >> > > >> > > > revisiting if a concrete need arises. > > > > >> > > > > > >> > > >> > > > > > > > >> > > > > > >> > > >> > > > Q2 I have some reservations about using > > > > >> > > SegmentId:State > > > > >> > > > > as > > > > >> > > > > > >> the > > > > >> > > > > > >> > > key. > > > > >> > > > > > >> > > >> A > > > > >> > > > > > >> > > >> > > > practical challenge we see today is that > > the > > > > >> same > > > > >> > > > logical > > > > >> > > > > > >> > segment > > > > >> > > > > > >> > > >> can > > > > >> > > > > > >> > > >> > be > > > > >> > > > > > >> > > >> > > > retried multiple times with different > > > > SegmentIds > > > > >> > > across > > > > >> > > > > > >> brokers. > > > > >> > > > > > >> > > If > > > > >> > > > > > >> > > >> the > > > > >> > > > > > >> > > >> > > key > > > > >> > > > > > >> > > >> > > > is SegmentId-based, it becomes harder to > > > > >> discover > > > > >> > and > > > > >> > > > > > >> tombstone > > > > >> > > > > > >> > > all > > > > >> > > > > > >> > > >> > > related > > > > >> > > > > > >> > > >> > > > attempts when the segment eventually > > > expires. > > > > >> The > > > > >> > > > > > >> > > >> > > > > > > TopicId:Partition:EndOffset:BrokerLeaderEpoch > > > > >> key > > > > >> > is > > > > >> > > > > > >> > deterministic > > > > >> > > > > > >> > > >> for > > > > >> > > > > > >> > > >> > a > > > > >> > > > > > >> > > >> > > > logical segment attempt and helps group > > > > retries > > > > >> by > > > > >> > > > epoch, > > > > >> > > > > > >> which > > > > >> > > > > > >> > > >> > > simplifies > > > > >> > > > > > >> > > >> > > > cleanup and reasoning about state. I’d > > love > > > to > > > > >> > > > understand > > > > >> > > > > > the > > > > >> > > > > > >> > > >> benefits > > > > >> > > > > > >> > > >> > > > you’re seeing with SegmentId:State > > compared > > > to > > > > >> the > > > > >> > > > > > >> > > >> offset/epoch-based > > > > >> > > > > > >> > > >> > key > > > > >> > > > > > >> > > >> > > > so we can weigh the trade-offs. > > > > >> > > > > > >> > > >> > > > > > > > >> > > > > > >> > > >> > > > On partitioning: with this proposal, all > > > > states > > > > >> > for a > > > > >> > > > > given > > > > >> > > > > > >> user > > > > >> > > > > > >> > > >> > > > topic-partition still map to the same > > > metadata > > > > >> > > > partition. > > > > >> > > > > > >> That > > > > >> > > > > > >> > > >> remains > > > > >> > > > > > >> > > >> > > true > > > > >> > > > > > >> > > >> > > > for the existing __remote_log_metadata > > > > >> (unchanged > > > > >> > > > > > >> partitioner) > > > > >> > > > > > >> > and > > > > >> > > > > > >> > > >> for > > > > >> > > > > > >> > > >> > > the > > > > >> > > > > > >> > > >> > > > new __remote_log_metadata_compacted, > > > > preserving > > > > >> the > > > > >> > > > > > >> properties > > > > >> > > > > > >> > > >> > > > RemoteMetadataCache relies on. > > > > >> > > > > > >> > > >> > > > > > > > >> > > > > > >> > > >> > > > Q3 It should be fine for ConsumerTask to > > > > ignore > > > > >> > > > tombstone > > > > >> > > > > > >> > records > > > > >> > > > > > >> > > >> (null > > > > >> > > > > > >> > > >> > > > values) and no-op. > > > > >> > > > > > >> > > >> > > > > > > > >> > > > > > >> > > >> > > > Q4 Although TBRLMM is a sample RLMM > > > > >> implementation, > > > > >> > > > it’s > > > > >> > > > > > >> > currently > > > > >> > > > > > >> > > >> the > > > > >> > > > > > >> > > >> > > only > > > > >> > > > > > >> > > >> > > > OSS option and is widely used. The new > > > > >> > > > > > >> > > >> __remote_log_metadata_compacted > > > > >> > > > > > >> > > >> > > > topic offers clear operational benefits > in > > > > that > > > > >> > > > context. > > > > >> > > > > We > > > > >> > > > > > >> can > > > > >> > > > > > >> > > also > > > > >> > > > > > >> > > >> > > > provide a configuration to let users > > choose > > > > >> whether > > > > >> > > > they > > > > >> > > > > > >> want to > > > > >> > > > > > >> > > >> keep > > > > >> > > > > > >> > > >> > the > > > > >> > > > > > >> > > >> > > > audit topic (__remote_log_metadata) in > > their > > > > >> > cluster. > > > > >> > > > > > >> > > >> > > > > > > > >> > > > > > >> > > >> > > > Q4a Enabling compaction on > > > > __remote_log_metadata > > > > >> > > alone > > > > >> > > > > may > > > > >> > > > > > >> not > > > > >> > > > > > >> > > fully > > > > >> > > > > > >> > > >> > > > address the unbounded growth, since we > > also > > > > >> need to > > > > >> > > > emit > > > > >> > > > > > >> > > tombstones > > > > >> > > > > > >> > > >> for > > > > >> > > > > > >> > > >> > > > expired keys to delete them. Deferring > > > > >> compaction > > > > >> > and > > > > >> > > > > > >> > tombstoning > > > > >> > > > > > >> > > to > > > > >> > > > > > >> > > >> > user > > > > >> > > > > > >> > > >> > > > configuration could make the code flow > > > > >> complicated, > > > > >> > > > also > > > > >> > > > > > add > > > > >> > > > > > >> > > >> > operational > > > > >> > > > > > >> > > >> > > > complexity and make outcomes less > > > predictable. > > > > >> The > > > > >> > > > > proposal > > > > >> > > > > > >> aims > > > > >> > > > > > >> > > to > > > > >> > > > > > >> > > >> > > provide > > > > >> > > > > > >> > > >> > > > a consistent experience by defining > > > > >> deterministic > > > > >> > > keys > > > > >> > > > > and > > > > >> > > > > > >> > > emitting > > > > >> > > > > > >> > > >> > > > tombstones as part of the broker’s > > > > >> > responsibilities, > > > > >> > > > > while > > > > >> > > > > > >> still > > > > >> > > > > > >> > > >> > allowing > > > > >> > > > > > >> > > >> > > > users to opt out of the audit topic if > > they > > > > >> prefer. > > > > >> > > > But I > > > > >> > > > > > am > > > > >> > > > > > >> > open > > > > >> > > > > > >> > > to > > > > >> > > > > > >> > > >> > more > > > > >> > > > > > >> > > >> > > > discussion if there is any concrete > need I > > > > don't > > > > >> > > > foresee. > > > > >> > > > > > >> > > >> > > > > > > > >> > > > > > >> > > >> > > > > > > > >> > > > > > >> > > >> > > > Thanks, > > > > >> > > > > > >> > > >> > > > > > > > >> > > > > > >> > > >> > > > Lijun Tong > > > > >> > > > > > >> > > >> > > > > > > > >> > > > > > >> > > >> > > > Kamal Chandraprakash < > > > > >> > [email protected] > > > > >> > > > > > > > >> > > > > > >> > > 于2026年1月6日周二 > > > > >> > > > > > >> > > >> > > > 01:01写道: > > > > >> > > > > > >> > > >> > > > > > > > >> > > > > > >> > > >> > > > > Hi Lijun, > > > > >> > > > > > >> > > >> > > > > > > > > >> > > > > > >> > > >> > > > > Thanks for the KIP! Went over the > first > > > > pass. > > > > >> > > > > > >> > > >> > > > > > > > > >> > > > > > >> > > >> > > > > Few Questions: > > > > >> > > > > > >> > > >> > > > > > > > > >> > > > > > >> > > >> > > > > 1. Are we going to maintain the same > > > > >> > > > > > >> > > >> > RemoteLogMetadataTopicPartitioner > > > > >> > > > > > >> > > >> > > > > < > > > > >> > > > > > >> > > >> > > > > > > > > >> > > > > > >> > > >> > > > > > > > >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > > > > >> > > > > > >> > > >> > > > > >> > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/RemoteLogMetadataTopicPartitioner.java > > > > >> > > > > > >> > > >> > > > > > > > > > >> > > > > > >> > > >> > > > > for both the topics? It is not clear > in > > > the > > > > >> KIP, > > > > >> > > > could > > > > >> > > > > > you > > > > >> > > > > > >> > > clarify > > > > >> > > > > > >> > > >> > it? > > > > >> > > > > > >> > > >> > > > > 2. Can the key be changed to > > > SegmentId:State > > > > >> > > instead > > > > >> > > > of > > > > >> > > > > > >> > > >> > > > > > > > > TopicId:Partition:EndOffset:BrokerLeaderEpoch > > > > >> if > > > > >> > > the > > > > >> > > > > same > > > > >> > > > > > >> > > >> partitioner > > > > >> > > > > > >> > > >> > > is > > > > >> > > > > > >> > > >> > > > > used? It is good to maintain all the > > > segment > > > > >> > states > > > > >> > > > > for a > > > > >> > > > > > >> > > >> > > > > user-topic-partition in the same > > metadata > > > > >> > > partition. > > > > >> > > > > > >> > > >> > > > > 3. Should we have to handle the > records > > > with > > > > >> null > > > > >> > > > value > > > > >> > > > > > >> > > >> (tombstone) > > > > >> > > > > > >> > > >> > in > > > > >> > > > > > >> > > >> > > > the > > > > >> > > > > > >> > > >> > > > > ConsumerTask > > > > >> > > > > > >> > > >> > > > > < > > > > >> > > > > > >> > > >> > > > > > > > > >> > > > > > >> > > >> > > > > > > > >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > > > > >> > > > > > >> > > >> > > > > >> > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/ConsumerTask.java?L166 > > > > >> > > > > > >> > > >> > > > > > > > > > >> > > > > > >> > > >> > > > > ? > > > > >> > > > > > >> > > >> > > > > 4. TBRLMM > > > > >> > > > > > >> > > >> > > > > < > > > > >> > > > > > >> > > >> > > > > > > > > >> > > > > > >> > > >> > > > > > > > >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > > > > >> > > > > > >> > > >> > > > > >> > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/TopicBasedRemoteLogMetadataManager.java > > > > >> > > > > > >> > > >> > > > > > > > > > >> > > > > > >> > > >> > > > > is a sample plugin implementation of > > RLMM. > > > > Not > > > > >> > sure > > > > >> > > > > > whether > > > > >> > > > > > >> > the > > > > >> > > > > > >> > > >> > > community > > > > >> > > > > > >> > > >> > > > > will agree to add one more internal > > topic > > > > for > > > > >> > this > > > > >> > > > > plugin > > > > >> > > > > > >> > impl. > > > > >> > > > > > >> > > >> > > > > 4a. Can we modify the new messages to > > the > > > > >> > > > > > >> > __remote_log_metadata > > > > >> > > > > > >> > > >> topic > > > > >> > > > > > >> > > >> > > to > > > > >> > > > > > >> > > >> > > > > contain the key and leave it to the > user > > > to > > > > >> > enable > > > > >> > > > > > >> compaction > > > > >> > > > > > >> > > for > > > > >> > > > > > >> > > >> > this > > > > >> > > > > > >> > > >> > > > > topic if they need? > > > > >> > > > > > >> > > >> > > > > > > > > >> > > > > > >> > > >> > > > > Thanks, > > > > >> > > > > > >> > > >> > > > > Kamal > > > > >> > > > > > >> > > >> > > > > > > > > >> > > > > > >> > > >> > > > > On Tue, Jan 6, 2026 at 7:35 AM Lijun > > Tong > > > < > > > > >> > > > > > >> > > >> [email protected]> > > > > >> > > > > > >> > > >> > > > wrote: > > > > >> > > > > > >> > > >> > > > > > > > > >> > > > > > >> > > >> > > > > > Hey Henry, > > > > >> > > > > > >> > > >> > > > > > > > > > >> > > > > > >> > > >> > > > > > Thank you for your time and > response! > > I > > > > >> really > > > > >> > > like > > > > >> > > > > > your > > > > >> > > > > > >> > > >> KIP-1248 > > > > >> > > > > > >> > > >> > > about > > > > >> > > > > > >> > > >> > > > > > offloading the consumption of remote > > log > > > > >> away > > > > >> > > from > > > > >> > > > > the > > > > >> > > > > > >> > broker, > > > > >> > > > > > >> > > >> and > > > > >> > > > > > >> > > >> > I > > > > >> > > > > > >> > > >> > > > > think > > > > >> > > > > > >> > > >> > > > > > with that change, the topic that > > enables > > > > the > > > > >> > > tiered > > > > >> > > > > > >> storage > > > > >> > > > > > >> > > can > > > > >> > > > > > >> > > >> > also > > > > >> > > > > > >> > > >> > > > have > > > > >> > > > > > >> > > >> > > > > > longer retention configurations and > > > would > > > > >> > benefit > > > > >> > > > > from > > > > >> > > > > > >> this > > > > >> > > > > > >> > > KIP > > > > >> > > > > > >> > > >> > too. > > > > >> > > > > > >> > > >> > > > > > > > > > >> > > > > > >> > > >> > > > > > Some suggestions: In your example > > > > >> scenarios, it > > > > >> > > > would > > > > >> > > > > > >> also > > > > >> > > > > > >> > be > > > > >> > > > > > >> > > >> good > > > > >> > > > > > >> > > >> > to > > > > >> > > > > > >> > > >> > > > add > > > > >> > > > > > >> > > >> > > > > > > an example of remote log segment > > > > deletion > > > > >> > > > triggered > > > > >> > > > > > by > > > > >> > > > > > >> > > >> retention > > > > >> > > > > > >> > > >> > > > policy > > > > >> > > > > > >> > > >> > > > > > > which will trigger generation of > > > > tombstone > > > > >> > > event > > > > >> > > > > into > > > > >> > > > > > >> > > metadata > > > > >> > > > > > >> > > >> > > topic > > > > >> > > > > > >> > > >> > > > > and > > > > >> > > > > > >> > > >> > > > > > > trigger log compaction/deletion 24 > > > hour > > > > >> > later, > > > > >> > > I > > > > >> > > > > > think > > > > >> > > > > > >> > this > > > > >> > > > > > >> > > is > > > > >> > > > > > >> > > >> > the > > > > >> > > > > > >> > > >> > > > key > > > > >> > > > > > >> > > >> > > > > > > event to cap the metadata topic > > size. > > > > >> > > > > > >> > > >> > > > > > > > > > >> > > > > > >> > > >> > > > > > > > > > >> > > > > > >> > > >> > > > > > Regarding to this suggestion, I am > not > > > > sure > > > > >> > > whether > > > > >> > > > > > >> > Scenario 4 > > > > >> > > > > > >> > > >> > > > > > < > > > > >> > > > > > >> > > >> > > > > > > > > > >> > > > > > >> > > >> > > > > > > > > >> > > > > > >> > > >> > > > > > > > >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > > > > >> > > > > > >> > > >> > > > > >> > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406618613#KIP1266:BoundingTheNumberOfRemoteLogMetadataMessagesviaCompactedTopic-Scenario4:SegmentDeletion > > > > >> > > > > > >> > > >> > > > > > > > > > > >> > > > > > >> > > >> > > > > > has > > > > >> > > > > > >> > > >> > > > > > covered it. I can add more rows in > the > > > > >> Timeline > > > > >> > > > Table > > > > >> > > > > > >> like > > > > >> > > > > > >> > > >> > T5+24hour > > > > >> > > > > > >> > > >> > > to > > > > >> > > > > > >> > > >> > > > > > indicate the messages are gone by > then > > > to > > > > >> > > > explicitly > > > > >> > > > > > show > > > > >> > > > > > >> > that > > > > >> > > > > > >> > > >> > > messages > > > > >> > > > > > >> > > >> > > > > are > > > > >> > > > > > >> > > >> > > > > > deleted, thus the number of messages > > are > > > > >> capped > > > > >> > > in > > > > >> > > > > the > > > > >> > > > > > >> > topic. > > > > >> > > > > > >> > > >> > > > > > > > > > >> > > > > > >> > > >> > > > > > Regarding whether the topic > > > > >> > __remote_log_metadata > > > > >> > > > is > > > > >> > > > > > >> still > > > > >> > > > > > >> > > >> > > necessary, I > > > > >> > > > > > >> > > >> > > > > am > > > > >> > > > > > >> > > >> > > > > > inclined to continue to have this > > topic > > > at > > > > >> > least > > > > >> > > > for > > > > >> > > > > > >> > debugging > > > > >> > > > > > >> > > >> > > purposes > > > > >> > > > > > >> > > >> > > > > so > > > > >> > > > > > >> > > >> > > > > > we can build confidence about the > > > > compacted > > > > >> > topic > > > > >> > > > > > >> change, we > > > > >> > > > > > >> > > can > > > > >> > > > > > >> > > >> > > > > > always choose to remove this topic > in > > > the > > > > >> > future > > > > >> > > > once > > > > >> > > > > > we > > > > >> > > > > > >> all > > > > >> > > > > > >> > > >> agree > > > > >> > > > > > >> > > >> > it > > > > >> > > > > > >> > > >> > > > > > provides limited value for the > users. > > > > >> > > > > > >> > > >> > > > > > > > > > >> > > > > > >> > > >> > > > > > Thanks, > > > > >> > > > > > >> > > >> > > > > > Lijun Tong > > > > >> > > > > > >> > > >> > > > > > > > > > >> > > > > > >> > > >> > > > > > > > > > >> > > > > > >> > > >> > > > > > Henry Haiying Cai via dev < > > > > >> > [email protected]> > > > > >> > > > > > >> > 于2026年1月5日周一 > > > > >> > > > > > >> > > >> > > 16:19写道: > > > > >> > > > > > >> > > >> > > > > > > > > > >> > > > > > >> > > >> > > > > > > Lijun, > > > > >> > > > > > >> > > >> > > > > > > > > > > >> > > > > > >> > > >> > > > > > > Thanks for the proposal and I > liked > > > your > > > > >> idea > > > > >> > > of > > > > >> > > > > > using > > > > >> > > > > > >> a > > > > >> > > > > > >> > > >> > compacted > > > > >> > > > > > >> > > >> > > > > topic > > > > >> > > > > > >> > > >> > > > > > > for tiered storage metadata topic. > > > > >> > > > > > >> > > >> > > > > > > > > > > >> > > > > > >> > > >> > > > > > > In our setup, we have set a > shorter > > > > >> retention > > > > >> > > (3 > > > > >> > > > > > days) > > > > >> > > > > > >> for > > > > >> > > > > > >> > > the > > > > >> > > > > > >> > > >> > > tiered > > > > >> > > > > > >> > > >> > > > > > > storage metadata topic to control > > the > > > > size > > > > >> > > > growth. > > > > >> > > > > > We > > > > >> > > > > > >> can > > > > >> > > > > > >> > > do > > > > >> > > > > > >> > > >> > that > > > > >> > > > > > >> > > >> > > > > since > > > > >> > > > > > >> > > >> > > > > > we > > > > >> > > > > > >> > > >> > > > > > > control all topic's retention > policy > > > in > > > > >> our > > > > >> > > > > clusters > > > > >> > > > > > >> and > > > > >> > > > > > >> > we > > > > >> > > > > > >> > > >> set a > > > > >> > > > > > >> > > >> > > > > uniform > > > > >> > > > > > >> > > >> > > > > > > retention.policy for all our > tiered > > > > >> storage > > > > >> > > > topics. > > > > >> > > > > > I > > > > >> > > > > > >> can > > > > >> > > > > > >> > > see > > > > >> > > > > > >> > > >> > > other > > > > >> > > > > > >> > > >> > > > > > > users/companies will not be able > to > > > > >> enforce > > > > >> > > that > > > > >> > > > > > >> retention > > > > >> > > > > > >> > > >> policy > > > > >> > > > > > >> > > >> > > to > > > > >> > > > > > >> > > >> > > > > all > > > > >> > > > > > >> > > >> > > > > > > tiered storage topics. > > > > >> > > > > > >> > > >> > > > > > > > > > > >> > > > > > >> > > >> > > > > > > Some suggestions: In your example > > > > >> scenarios, > > > > >> > it > > > > >> > > > > would > > > > >> > > > > > >> also > > > > >> > > > > > >> > > be > > > > >> > > > > > >> > > >> > good > > > > >> > > > > > >> > > >> > > to > > > > >> > > > > > >> > > >> > > > > add > > > > >> > > > > > >> > > >> > > > > > > an example of remote log segment > > > > deletion > > > > >> > > > triggered > > > > >> > > > > > by > > > > >> > > > > > >> > > >> retention > > > > >> > > > > > >> > > >> > > > policy > > > > >> > > > > > >> > > >> > > > > > > which will trigger generation of > > > > tombstone > > > > >> > > event > > > > >> > > > > into > > > > >> > > > > > >> > > metadata > > > > >> > > > > > >> > > >> > > topic > > > > >> > > > > > >> > > >> > > > > and > > > > >> > > > > > >> > > >> > > > > > > trigger log compaction/deletion 24 > > > hour > > > > >> > later, > > > > >> > > I > > > > >> > > > > > think > > > > >> > > > > > >> > this > > > > >> > > > > > >> > > is > > > > >> > > > > > >> > > >> > the > > > > >> > > > > > >> > > >> > > > key > > > > >> > > > > > >> > > >> > > > > > > event to cap the metadata topic > > size. > > > > >> > > > > > >> > > >> > > > > > > > > > > >> > > > > > >> > > >> > > > > > > For the original unbounded > > > > >> > remote_log_metadata > > > > >> > > > > topic, > > > > >> > > > > > >> I am > > > > >> > > > > > >> > > not > > > > >> > > > > > >> > > >> > sure > > > > >> > > > > > >> > > >> > > > > > > whether we still need it or not. > If > > > it > > > > is > > > > >> > left > > > > >> > > > > only > > > > >> > > > > > >> for > > > > >> > > > > > >> > > audit > > > > >> > > > > > >> > > >> > > trail > > > > >> > > > > > >> > > >> > > > > > > purpose, people can set up a data > > > > >> ingestion > > > > >> > > > > pipeline > > > > >> > > > > > to > > > > >> > > > > > >> > > ingest > > > > >> > > > > > >> > > >> > the > > > > >> > > > > > >> > > >> > > > > > content > > > > >> > > > > > >> > > >> > > > > > > of metadata topic into a separate > > > > storage > > > > >> > > > location. > > > > >> > > > > > I > > > > >> > > > > > >> > think > > > > >> > > > > > >> > > >> we > > > > >> > > > > > >> > > >> > can > > > > >> > > > > > >> > > >> > > > > have > > > > >> > > > > > >> > > >> > > > > > a > > > > >> > > > > > >> > > >> > > > > > > flag to have only one metadata > topic > > > > (the > > > > >> > > > compacted > > > > >> > > > > > >> > > version). > > > > >> > > > > > >> > > >> > > > > > > > > > > >> > > > > > >> > > >> > > > > > > > > > > >> > > > > > >> > > >> > > > > > > On Monday, January 5, 2026 at > > 01:22:42 > > > > PM > > > > >> > PST, > > > > >> > > > > Lijun > > > > >> > > > > > >> Tong > > > > >> > > > > > >> > < > > > > >> > > > > > >> > > >> > > > > > > [email protected]> wrote: > > > > >> > > > > > >> > > >> > > > > > > > > > > >> > > > > > >> > > >> > > > > > > > > > > >> > > > > > >> > > >> > > > > > > > > > > >> > > > > > >> > > >> > > > > > > > > > > >> > > > > > >> > > >> > > > > > > > > > > >> > > > > > >> > > >> > > > > > > Hello Kafka Community, > > > > >> > > > > > >> > > >> > > > > > > > > > > >> > > > > > >> > > >> > > > > > > I would like to start a discussion > > on > > > > >> > KIP-1266, > > > > >> > > > > which > > > > >> > > > > > >> > > >> proposes to > > > > >> > > > > > >> > > >> > > add > > > > >> > > > > > >> > > >> > > > > > > another new compacted remote log > > > > metadata > > > > >> > topic > > > > >> > > > for > > > > >> > > > > > the > > > > >> > > > > > >> > > tiered > > > > >> > > > > > >> > > >> > > > storage, > > > > >> > > > > > >> > > >> > > > > > to > > > > >> > > > > > >> > > >> > > > > > > limit the number of messages that > > need > > > > to > > > > >> be > > > > >> > > > > iterated > > > > >> > > > > > >> to > > > > >> > > > > > >> > > build > > > > >> > > > > > >> > > >> > the > > > > >> > > > > > >> > > >> > > > > remote > > > > >> > > > > > >> > > >> > > > > > > metadata state. > > > > >> > > > > > >> > > >> > > > > > > > > > > >> > > > > > >> > > >> > > > > > > KIP link: KIP-1266 Bounding The > > Number > > > > Of > > > > >> > > > > > >> > RemoteLogMetadata > > > > >> > > > > > >> > > >> > > Messages > > > > >> > > > > > >> > > >> > > > > via > > > > >> > > > > > >> > > >> > > > > > > Compacted RemoteLogMetadata Topic > > > > >> > > > > > >> > > >> > > > > > > < > > > > >> > > > > > >> > > >> > > > > > > > > > > >> > > > > > >> > > >> > > > > > > > > > >> > > > > > >> > > >> > > > > > > > > >> > > > > > >> > > >> > > > > > > > >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > > > > >> > > > > > >> > > >> > > > > >> > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1266%3A+Bounding+The+Number+Of+RemoteLogMetadata+Messages+via+Compacted+Topic > > > > >> > > > > > >> > > >> > > > > > > > > > > > >> > > > > > >> > > >> > > > > > > > > > > >> > > > > > >> > > >> > > > > > > Background: > > > > >> > > > > > >> > > >> > > > > > > The current Tiered Storage > > > > implementation > > > > >> > uses > > > > >> > > a > > > > >> > > > > > >> > > >> > > > __remote_log_metadata > > > > >> > > > > > >> > > >> > > > > > > topic with infinite retention and > > > > >> > delete-based > > > > >> > > > > > cleanup > > > > >> > > > > > >> > > policy, > > > > >> > > > > > >> > > >> > > > causing > > > > >> > > > > > >> > > >> > > > > > > unbounded growth, slow broker > > > bootstrap, > > > > >> no > > > > >> > > > > mechanism > > > > >> > > > > > >> to > > > > >> > > > > > >> > > >> clean up > > > > >> > > > > > >> > > >> > > > > expired > > > > >> > > > > > >> > > >> > > > > > > segment metadata, and inefficient > > > > >> re-reading > > > > >> > > from > > > > >> > > > > > >> offset 0 > > > > >> > > > > > >> > > >> during > > > > >> > > > > > >> > > >> > > > > > > leadership changes. > > > > >> > > > > > >> > > >> > > > > > > > > > > >> > > > > > >> > > >> > > > > > > Proposal: > > > > >> > > > > > >> > > >> > > > > > > A dual-topic approach that > > introduces > > > a > > > > >> new > > > > >> > > > > > >> > > >> > > > > > __remote_log_metadata_compacted > > > > >> > > > > > >> > > >> > > > > > > topic using log compaction with > > > > >> deterministic > > > > >> > > > > > >> offset-based > > > > >> > > > > > >> > > >> keys, > > > > >> > > > > > >> > > >> > > > while > > > > >> > > > > > >> > > >> > > > > > > preserving the existing topic for > > > audit > > > > >> > > history; > > > > >> > > > > this > > > > >> > > > > > >> > allows > > > > >> > > > > > >> > > >> > > brokers > > > > >> > > > > > >> > > >> > > > to > > > > >> > > > > > >> > > >> > > > > > > build their metadata cache > > exclusively > > > > >> from > > > > >> > the > > > > >> > > > > > >> compacted > > > > >> > > > > > >> > > >> topic, > > > > >> > > > > > >> > > >> > > > > enables > > > > >> > > > > > >> > > >> > > > > > > cleanup of expired segment > metadata > > > > >> through > > > > >> > > > > > tombstones, > > > > >> > > > > > >> > and > > > > >> > > > > > >> > > >> > > includes > > > > >> > > > > > >> > > >> > > > a > > > > >> > > > > > >> > > >> > > > > > > migration strategy to populate the > > new > > > > >> topic > > > > >> > > > during > > > > >> > > > > > >> > > >> > > > upgrade—delivering > > > > >> > > > > > >> > > >> > > > > > > bounded metadata growth and faster > > > > broker > > > > >> > > startup > > > > >> > > > > > while > > > > >> > > > > > >> > > >> > maintaining > > > > >> > > > > > >> > > >> > > > > > > backward compatibility. > > > > >> > > > > > >> > > >> > > > > > > > > > > >> > > > > > >> > > >> > > > > > > More details are in the attached > KIP > > > > link. > > > > >> > > > > > >> > > >> > > > > > > Looking forward to your thoughts. > > > > >> > > > > > >> > > >> > > > > > > > > > > >> > > > > > >> > > >> > > > > > > Thank you for your time! > > > > >> > > > > > >> > > >> > > > > > > > > > > >> > > > > > >> > > >> > > > > > > Best, > > > > >> > > > > > >> > > >> > > > > > > Lijun Tong > > > > >> > > > > > >> > > >> > > > > > > > > > > >> > > > > > >> > > >> > > > > > > > > > >> > > > > > >> > > >> > > > > > > > > >> > > > > > >> > > >> > > > > > > > >> > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > > > > >> > > > > > >> > > >> > > > > >> > > > > > >> > > > > > > > >> > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > > > > > >
