Hi Lijun,
Sorry for the late reply. Went over the KIP again. Overall LGTM. Few
points:
> This KIP aims to solve this issue through introducing another compacted
topic for the brokers to bootstrap the state from
1. Shall we update the motivation section to mention that another topic is
not introduced?
2. Assume that the topic is enabled with compaction and only one event is
maintained per segment. If there is a transient error in the remote log
deletion,
then the COPY_SEGMENT started / finished events might be compacted by
the DELETE_SEGMENT_STARTED events. If the broker is restarted during
this time, will there be dangling remote log segments? Currently,
during restart, the broker discards the events if it does not see the
COPY_SEGMENT_STARTED events.
Thanks,
Kamal
On Thu, Mar 26, 2026 at 5:08 AM Lijun Tong <[email protected]> wrote:
> Hi,
>
> I have started a Vote thread for this KIP, considering all questions raised
> so far have been answered. I am happy to continue the discussion if needed,
> otherwise, this is a friendly reminder on the vote for this KIP.
>
> Thanks,
> Lijun Tong
>
>
>
> Lijun Tong <[email protected]> 于2026年1月19日周一 17:59写道:
>
> > Hey Kamal,
> >
> > Thanks for raising these questions. Here are my responses to your
> > questions:
> > Q1 and Q2:
> > I think both questions boil down to how to release this new feature, both
> > questions are valid concerns. The solution I have in mind is this feature
> > is *gated by the metadata version*. The new tombstone semantics and the
> > additional fields (for example in RemoteLogSegmentUpdateRecord) are only
> > enabled once the cluster metadata version is upgraded to the version that
> > introduces this feature. As long as the cluster metadata version is not
> > bumped, the system will not produce tombstone records. Therefore, during
> > rolling upgrades (mixed 4.2/4.3 brokers), the feature remains effectively
> > disabled. Tombstones will only start being produced after the metadata
> > version is upgraded, at which point all brokers are already required to
> > support the new behavior.
> >
> > Since Kafka does not support metadata version downgrades at the moment,
> > once a metadata version that supports this feature is enabled, it cannot
> be
> > downgraded to a version that does not support it. I will add these
> details
> > to the KIP later.
> > Q3. This is an *editing mistake* in the KIP. Thanks for pointing it out —
> > the enum value has already been corrected in the latest revision to
> remove
> > the unused placeholder and keep the state values contiguous and
> consistent.
> > Q4. I don't foresee the quota mechanism will interfere with the state
> > transition in any way so far, let me know if any concern hits you.
> >
> > Thanks,
> > Lijun
> >
> > Kamal Chandraprakash <[email protected]> 于2026年1月18日周日
> > 00:40写道:
> >
> >> Hi Lijun,
> >>
> >> Thanks for updating the KIP!
> >>
> >> The updated migration plan looks clean to me. Few questions:
> >>
> >> 1. The ConsumerTask in 4.2 Kafka build does not handle the tombstone
> >> records. Should the tombstone records be sent only when all the brokers
> >> are
> >> upgraded to 4.3 version?
> >>
> >> 2. Once all the brokers are upgraded and the __remote_log_metadata topic
> >> cleanup policy changed to compact. Then, downgrading the brokers is not
> >> allowed as the records without key will throw an error while producing
> the
> >> compacted topic. Shall we mention this in the compatibility section?
> >>
> >> 3. In the RemoteLogSegmentState Enum, why is the value 1 marked as
> unused?
> >>
> >> 4. Regarding the key (TopicIdPartition:EndOffset:BrokerLeaderEpoch), we
> >> may
> >> have to check for scenarios where there is segment lag due to remote log
> >> write quota. Will the state transition work correctly? Will come back to
> >> this later.
> >>
> >> Thanks,
> >> Kamal
> >>
> >> On Fri, Jan 16, 2026 at 4:50 AM jian fu <[email protected]> wrote:
> >>
> >> > Hi Lijun and Kamal
> >> > I also think we don't need to keep delJIanpolicy in final config,if
> >> so,we
> >> > should always keep remembering all of our topic retention time must
> less
> >> > than the value,right?It is one protect with risk involved.
> >> > Regards
> >> > JIan
> >> >
> >> >
> >> >
> >> > Lijun Tong <[email protected]>于2026年1月16日 周五06:45写道:
> >> >
> >> > > Hey Kamal,
> >> > >
> >> > > Some additional points about the Q4,
> >> > >
> >> > > > The user can decide when to change their internal topic cleanup
> >> policy
> >> > to
> >> > > > compact. If someone retains
> >> > > > the data in the remote storage for 3 months, then they can migrate
> >> to
> >> > the
> >> > > > compacted topic after 3 months
> >> > > > post rolling out this change. And, update their cleanup policy to
> >> > > [compact,
> >> > > > delete].
> >> > >
> >> > >
> >> > > I don't think it's a good idea to keep delete in the final cleanup
> >> policy
> >> > > for the topic `__remote_log_metadata`, as this still requires the
> >> user to
> >> > > keep track of the max retention hours of topics that have remote
> >> storage
> >> > > enabled, and it's operational burden. It's also hard to reason about
> >> what
> >> > > will happen if the user configures the wrong retention.ms. I hope
> >> this
> >> > > makes sense.
> >> > >
> >> > >
> >> > > Thanks,
> >> > > Lijun Tong
> >> > >
> >> > > Lijun Tong <[email protected]> 于2026年1月15日周四 11:43写道:
> >> > >
> >> > > > Hey Kamal,
> >> > > >
> >> > > > Thanks for your reply! I am glad we are on the same page with
> making
> >> > the
> >> > > > __remote_log_metadata topic compacted optional for the user now, I
> >> will
> >> > > > update the KIP with this change.
> >> > > >
> >> > > > For the Q2:
> >> > > > With the key designed as
> >> TopicId:Partition:EndOffset:BrokerLeaderEpoch,
> >> > > > even the same broker retries the upload multiple times for the
> same
> >> log
> >> > > > segment, the latest retry attempt with the latest segment UUID
> will
> >> > > > overwrite the previous attempts' value since they share the same
> >> key,
> >> > so
> >> > > we
> >> > > > don't need to explicitly track the failed upload metadata, because
> >> it's
> >> > > > gone already by the later attempt. That's my understanding about
> the
> >> > > > RLMCopyTask, correct me if I am wrong.
> >> > > >
> >> > > > Thanks,
> >> > > > Lijun Tong
> >> > > >
> >> > > > Kamal Chandraprakash <[email protected]>
> 于2026年1月14日周三
> >> > > > 21:18写道:
> >> > > >
> >> > > >> Hi Lijun,
> >> > > >>
> >> > > >> Thanks for the reply!
> >> > > >>
> >> > > >> Q1: Sounds good. Could you clarify it in the KIP that the same
> >> > > partitioner
> >> > > >> will be used?
> >> > > >>
> >> > > >> Q2: With TopicId:Partition:EndOffset:BrokerLeaderEpoch key, if
> the
> >> > same
> >> > > >> broker retries the upload due to intermittent
> >> > > >> issues in object storage (or) RLMM. Then, those failed upload
> >> metadata
> >> > > >> also
> >> > > >> need to be cleared.
> >> > > >>
> >> > > >> Q3: We may have to skip the null value records in the
> ConsumerTask.
> >> > > >>
> >> > > >> Q4a: The idea is to keep the cleanup policy as "delete" and also
> >> send
> >> > > the
> >> > > >> tombstone markers
> >> > > >> to the existing `__remote_log_metadata` topic. And, handle the
> >> > tombstone
> >> > > >> records in the ConsumerTask.
> >> > > >>
> >> > > >> The user can decide when to change their internal topic cleanup
> >> policy
> >> > > to
> >> > > >> compact. If someone retains
> >> > > >> the data in the remote storage for 3 months, then they can
> migrate
> >> to
> >> > > the
> >> > > >> compacted topic after 3 months
> >> > > >> post rolling out this change. And, update their cleanup policy to
> >> > > >> [compact,
> >> > > >> delete].
> >> > > >>
> >> > > >> Thanks,
> >> > > >> Kamal
> >> > > >>
> >> > > >> On Thu, Jan 15, 2026 at 4:12 AM Lijun Tong <
> >> [email protected]>
> >> > > >> wrote:
> >> > > >>
> >> > > >> > Hey Jian,
> >> > > >> >
> >> > > >> > Thanks for your time to review this KIP. I appreciate that you
> >> > > propose a
> >> > > >> > simpler migration solution to onboard the new feature.
> >> > > >> >
> >> > > >> > There are 2 points that I think can be further refined on:
> >> > > >> >
> >> > > >> > 1). make the topic compacted optional, although the new feature
> >> will
> >> > > >> > continue to emit tombstone message for those expired log
> segments
> >> > even
> >> > > >> when
> >> > > >> > the topic is still on time-based retention mode, so once user
> >> > switched
> >> > > >> to
> >> > > >> > using the compacted topic, those expired messages can still be
> >> > deleted
> >> > > >> > despite the topic is not retention based anymore.
> >> > > >> > 2). we need to expose some flag to the user to indicate whether
> >> the
> >> > > >> topic
> >> > > >> > can be flipped to compacted by checking whether all the old
> >> format
> >> > > >> > keyed-less message has expired, and allow user to choose to
> flip
> >> to
> >> > > >> > compacted only when the flag is true.
> >> > > >> >
> >> > > >> > Thanks for sharing your idea. I will update the KIP later with
> >> this
> >> > > new
> >> > > >> > idea.
> >> > > >> >
> >> > > >> > Best,
> >> > > >> > Lijun Tong
> >> > > >> >
> >> > > >> >
> >> > > >> > jian fu <[email protected]> 于2026年1月12日周一 04:55写道:
> >> > > >> >
> >> > > >> > > Hi Lijun Tong:
> >> > > >> > >
> >> > > >> > > Thanks for your KIP which raise this critical issue.
> >> > > >> > >
> >> > > >> > > what about just keep one topic instead of involve another
> >> topic.
> >> > > >> > > for existed topic data's migration. maybe we can use this way
> >> to
> >> > > solve
> >> > > >> > the
> >> > > >> > > issue:
> >> > > >> > > (1) set the retention date > all of topic which enable remote
> >> > > >> storage's
> >> > > >> > > retention time
> >> > > >> > > (2) deploy new kafka version with feature: which send the
> >> message
> >> > > >> with
> >> > > >> > key
> >> > > >> > > (3) wait all the message expired and new message with key
> >> coming
> >> > to
> >> > > >> the
> >> > > >> > > topic
> >> > > >> > > (4) convert the topic to compact
> >> > > >> > >
> >> > > >> > > I don't test it. Just propose this solution according to code
> >> > review
> >> > > >> > > result. just for your reference.
> >> > > >> > > The steps maybe a little complex. but it can avoiding add new
> >> > topic.
> >> > > >> > >
> >> > > >> > > Regards
> >> > > >> > > Jian
> >> > > >> > >
> >> > > >> > > Lijun Tong <[email protected]> 于2026年1月8日周四 09:17写道:
> >> > > >> > >
> >> > > >> > > > Hey Kamal,
> >> > > >> > > >
> >> > > >> > > >
> >> > > >> > > > Thanks for your time for the review.
> >> > > >> > > >
> >> > > >> > > >
> >> > > >> > > > Here is my response to your questions:
> >> > > >> > > >
> >> > > >> > > > Q1 At this point, I don’t see a need to change
> >> > > >> > > > RemoteLogMetadataTopicPartitioner for this design. Nothing
> in
> >> > the
> >> > > >> > current
> >> > > >> > > > approach appears to require a partitioner change, but I’m
> >> open
> >> > to
> >> > > >> > > > revisiting if a concrete need arises.
> >> > > >> > > >
> >> > > >> > > > Q2 I have some reservations about using SegmentId:State as
> >> the
> >> > > key.
> >> > > >> A
> >> > > >> > > > practical challenge we see today is that the same logical
> >> > segment
> >> > > >> can
> >> > > >> > be
> >> > > >> > > > retried multiple times with different SegmentIds across
> >> brokers.
> >> > > If
> >> > > >> the
> >> > > >> > > key
> >> > > >> > > > is SegmentId-based, it becomes harder to discover and
> >> tombstone
> >> > > all
> >> > > >> > > related
> >> > > >> > > > attempts when the segment eventually expires. The
> >> > > >> > > > TopicId:Partition:EndOffset:BrokerLeaderEpoch key is
> >> > deterministic
> >> > > >> for
> >> > > >> > a
> >> > > >> > > > logical segment attempt and helps group retries by epoch,
> >> which
> >> > > >> > > simplifies
> >> > > >> > > > cleanup and reasoning about state. I’d love to understand
> the
> >> > > >> benefits
> >> > > >> > > > you’re seeing with SegmentId:State compared to the
> >> > > >> offset/epoch-based
> >> > > >> > key
> >> > > >> > > > so we can weigh the trade-offs.
> >> > > >> > > >
> >> > > >> > > > On partitioning: with this proposal, all states for a given
> >> user
> >> > > >> > > > topic-partition still map to the same metadata partition.
> >> That
> >> > > >> remains
> >> > > >> > > true
> >> > > >> > > > for the existing __remote_log_metadata (unchanged
> >> partitioner)
> >> > and
> >> > > >> for
> >> > > >> > > the
> >> > > >> > > > new __remote_log_metadata_compacted, preserving the
> >> properties
> >> > > >> > > > RemoteMetadataCache relies on.
> >> > > >> > > >
> >> > > >> > > > Q3 It should be fine for ConsumerTask to ignore tombstone
> >> > records
> >> > > >> (null
> >> > > >> > > > values) and no-op.
> >> > > >> > > >
> >> > > >> > > > Q4 Although TBRLMM is a sample RLMM implementation, it’s
> >> > currently
> >> > > >> the
> >> > > >> > > only
> >> > > >> > > > OSS option and is widely used. The new
> >> > > >> __remote_log_metadata_compacted
> >> > > >> > > > topic offers clear operational benefits in that context. We
> >> can
> >> > > also
> >> > > >> > > > provide a configuration to let users choose whether they
> >> want to
> >> > > >> keep
> >> > > >> > the
> >> > > >> > > > audit topic (__remote_log_metadata) in their cluster.
> >> > > >> > > >
> >> > > >> > > > Q4a Enabling compaction on __remote_log_metadata alone may
> >> not
> >> > > fully
> >> > > >> > > > address the unbounded growth, since we also need to emit
> >> > > tombstones
> >> > > >> for
> >> > > >> > > > expired keys to delete them. Deferring compaction and
> >> > tombstoning
> >> > > to
> >> > > >> > user
> >> > > >> > > > configuration could make the code flow complicated, also
> add
> >> > > >> > operational
> >> > > >> > > > complexity and make outcomes less predictable. The proposal
> >> aims
> >> > > to
> >> > > >> > > provide
> >> > > >> > > > a consistent experience by defining deterministic keys and
> >> > > emitting
> >> > > >> > > > tombstones as part of the broker’s responsibilities, while
> >> still
> >> > > >> > allowing
> >> > > >> > > > users to opt out of the audit topic if they prefer. But I
> am
> >> > open
> >> > > to
> >> > > >> > more
> >> > > >> > > > discussion if there is any concrete need I don't foresee.
> >> > > >> > > >
> >> > > >> > > >
> >> > > >> > > > Thanks,
> >> > > >> > > >
> >> > > >> > > > Lijun Tong
> >> > > >> > > >
> >> > > >> > > > Kamal Chandraprakash <[email protected]>
> >> > > 于2026年1月6日周二
> >> > > >> > > > 01:01写道:
> >> > > >> > > >
> >> > > >> > > > > Hi Lijun,
> >> > > >> > > > >
> >> > > >> > > > > Thanks for the KIP! Went over the first pass.
> >> > > >> > > > >
> >> > > >> > > > > Few Questions:
> >> > > >> > > > >
> >> > > >> > > > > 1. Are we going to maintain the same
> >> > > >> > RemoteLogMetadataTopicPartitioner
> >> > > >> > > > > <
> >> > > >> > > > >
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/RemoteLogMetadataTopicPartitioner.java
> >> > > >> > > > > >
> >> > > >> > > > > for both the topics? It is not clear in the KIP, could
> you
> >> > > clarify
> >> > > >> > it?
> >> > > >> > > > > 2. Can the key be changed to SegmentId:State instead of
> >> > > >> > > > > TopicId:Partition:EndOffset:BrokerLeaderEpoch if the same
> >> > > >> partitioner
> >> > > >> > > is
> >> > > >> > > > > used? It is good to maintain all the segment states for a
> >> > > >> > > > > user-topic-partition in the same metadata partition.
> >> > > >> > > > > 3. Should we have to handle the records with null value
> >> > > >> (tombstone)
> >> > > >> > in
> >> > > >> > > > the
> >> > > >> > > > > ConsumerTask
> >> > > >> > > > > <
> >> > > >> > > > >
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/ConsumerTask.java?L166
> >> > > >> > > > > >
> >> > > >> > > > > ?
> >> > > >> > > > > 4. TBRLMM
> >> > > >> > > > > <
> >> > > >> > > > >
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/TopicBasedRemoteLogMetadataManager.java
> >> > > >> > > > > >
> >> > > >> > > > > is a sample plugin implementation of RLMM. Not sure
> whether
> >> > the
> >> > > >> > > community
> >> > > >> > > > > will agree to add one more internal topic for this plugin
> >> > impl.
> >> > > >> > > > > 4a. Can we modify the new messages to the
> >> > __remote_log_metadata
> >> > > >> topic
> >> > > >> > > to
> >> > > >> > > > > contain the key and leave it to the user to enable
> >> compaction
> >> > > for
> >> > > >> > this
> >> > > >> > > > > topic if they need?
> >> > > >> > > > >
> >> > > >> > > > > Thanks,
> >> > > >> > > > > Kamal
> >> > > >> > > > >
> >> > > >> > > > > On Tue, Jan 6, 2026 at 7:35 AM Lijun Tong <
> >> > > >> [email protected]>
> >> > > >> > > > wrote:
> >> > > >> > > > >
> >> > > >> > > > > > Hey Henry,
> >> > > >> > > > > >
> >> > > >> > > > > > Thank you for your time and response! I really like
> your
> >> > > >> KIP-1248
> >> > > >> > > about
> >> > > >> > > > > > offloading the consumption of remote log away from the
> >> > broker,
> >> > > >> and
> >> > > >> > I
> >> > > >> > > > > think
> >> > > >> > > > > > with that change, the topic that enables the tiered
> >> storage
> >> > > can
> >> > > >> > also
> >> > > >> > > > have
> >> > > >> > > > > > longer retention configurations and would benefit from
> >> this
> >> > > KIP
> >> > > >> > too.
> >> > > >> > > > > >
> >> > > >> > > > > > Some suggestions: In your example scenarios, it would
> >> also
> >> > be
> >> > > >> good
> >> > > >> > to
> >> > > >> > > > add
> >> > > >> > > > > > > an example of remote log segment deletion triggered
> by
> >> > > >> retention
> >> > > >> > > > policy
> >> > > >> > > > > > > which will trigger generation of tombstone event into
> >> > > metadata
> >> > > >> > > topic
> >> > > >> > > > > and
> >> > > >> > > > > > > trigger log compaction/deletion 24 hour later, I
> think
> >> > this
> >> > > is
> >> > > >> > the
> >> > > >> > > > key
> >> > > >> > > > > > > event to cap the metadata topic size.
> >> > > >> > > > > >
> >> > > >> > > > > >
> >> > > >> > > > > > Regarding to this suggestion, I am not sure whether
> >> > Scenario 4
> >> > > >> > > > > > <
> >> > > >> > > > > >
> >> > > >> > > > >
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406618613#KIP1266:BoundingTheNumberOfRemoteLogMetadataMessagesviaCompactedTopic-Scenario4:SegmentDeletion
> >> > > >> > > > > > >
> >> > > >> > > > > > has
> >> > > >> > > > > > covered it. I can add more rows in the Timeline Table
> >> like
> >> > > >> > T5+24hour
> >> > > >> > > to
> >> > > >> > > > > > indicate the messages are gone by then to explicitly
> show
> >> > that
> >> > > >> > > messages
> >> > > >> > > > > are
> >> > > >> > > > > > deleted, thus the number of messages are capped in the
> >> > topic.
> >> > > >> > > > > >
> >> > > >> > > > > > Regarding whether the topic __remote_log_metadata is
> >> still
> >> > > >> > > necessary, I
> >> > > >> > > > > am
> >> > > >> > > > > > inclined to continue to have this topic at least for
> >> > debugging
> >> > > >> > > purposes
> >> > > >> > > > > so
> >> > > >> > > > > > we can build confidence about the compacted topic
> >> change, we
> >> > > can
> >> > > >> > > > > > always choose to remove this topic in the future once
> we
> >> all
> >> > > >> agree
> >> > > >> > it
> >> > > >> > > > > > provides limited value for the users.
> >> > > >> > > > > >
> >> > > >> > > > > > Thanks,
> >> > > >> > > > > > Lijun Tong
> >> > > >> > > > > >
> >> > > >> > > > > >
> >> > > >> > > > > > Henry Haiying Cai via dev <[email protected]>
> >> > 于2026年1月5日周一
> >> > > >> > > 16:19写道:
> >> > > >> > > > > >
> >> > > >> > > > > > > Lijun,
> >> > > >> > > > > > >
> >> > > >> > > > > > > Thanks for the proposal and I liked your idea of
> using
> >> a
> >> > > >> > compacted
> >> > > >> > > > > topic
> >> > > >> > > > > > > for tiered storage metadata topic.
> >> > > >> > > > > > >
> >> > > >> > > > > > > In our setup, we have set a shorter retention (3
> days)
> >> for
> >> > > the
> >> > > >> > > tiered
> >> > > >> > > > > > > storage metadata topic to control the size growth.
> We
> >> can
> >> > > do
> >> > > >> > that
> >> > > >> > > > > since
> >> > > >> > > > > > we
> >> > > >> > > > > > > control all topic's retention policy in our clusters
> >> and
> >> > we
> >> > > >> set a
> >> > > >> > > > > uniform
> >> > > >> > > > > > > retention.policy for all our tiered storage topics.
> I
> >> can
> >> > > see
> >> > > >> > > other
> >> > > >> > > > > > > users/companies will not be able to enforce that
> >> retention
> >> > > >> policy
> >> > > >> > > to
> >> > > >> > > > > all
> >> > > >> > > > > > > tiered storage topics.
> >> > > >> > > > > > >
> >> > > >> > > > > > > Some suggestions: In your example scenarios, it would
> >> also
> >> > > be
> >> > > >> > good
> >> > > >> > > to
> >> > > >> > > > > add
> >> > > >> > > > > > > an example of remote log segment deletion triggered
> by
> >> > > >> retention
> >> > > >> > > > policy
> >> > > >> > > > > > > which will trigger generation of tombstone event into
> >> > > metadata
> >> > > >> > > topic
> >> > > >> > > > > and
> >> > > >> > > > > > > trigger log compaction/deletion 24 hour later, I
> think
> >> > this
> >> > > is
> >> > > >> > the
> >> > > >> > > > key
> >> > > >> > > > > > > event to cap the metadata topic size.
> >> > > >> > > > > > >
> >> > > >> > > > > > > For the original unbounded remote_log_metadata topic,
> >> I am
> >> > > not
> >> > > >> > sure
> >> > > >> > > > > > > whether we still need it or not. If it is left only
> >> for
> >> > > audit
> >> > > >> > > trail
> >> > > >> > > > > > > purpose, people can set up a data ingestion pipeline
> to
> >> > > ingest
> >> > > >> > the
> >> > > >> > > > > > content
> >> > > >> > > > > > > of metadata topic into a separate storage location.
> I
> >> > think
> >> > > >> we
> >> > > >> > can
> >> > > >> > > > > have
> >> > > >> > > > > > a
> >> > > >> > > > > > > flag to have only one metadata topic (the compacted
> >> > > version).
> >> > > >> > > > > > >
> >> > > >> > > > > > >
> >> > > >> > > > > > > On Monday, January 5, 2026 at 01:22:42 PM PST, Lijun
> >> Tong
> >> > <
> >> > > >> > > > > > > [email protected]> wrote:
> >> > > >> > > > > > >
> >> > > >> > > > > > >
> >> > > >> > > > > > >
> >> > > >> > > > > > >
> >> > > >> > > > > > >
> >> > > >> > > > > > > Hello Kafka Community,
> >> > > >> > > > > > >
> >> > > >> > > > > > > I would like to start a discussion on KIP-1266, which
> >> > > >> proposes to
> >> > > >> > > add
> >> > > >> > > > > > > another new compacted remote log metadata topic for
> the
> >> > > tiered
> >> > > >> > > > storage,
> >> > > >> > > > > > to
> >> > > >> > > > > > > limit the number of messages that need to be iterated
> >> to
> >> > > build
> >> > > >> > the
> >> > > >> > > > > remote
> >> > > >> > > > > > > metadata state.
> >> > > >> > > > > > >
> >> > > >> > > > > > > KIP link: KIP-1266 Bounding The Number Of
> >> > RemoteLogMetadata
> >> > > >> > > Messages
> >> > > >> > > > > via
> >> > > >> > > > > > > Compacted RemoteLogMetadata Topic
> >> > > >> > > > > > > <
> >> > > >> > > > > > >
> >> > > >> > > > > >
> >> > > >> > > > >
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1266%3A+Bounding+The+Number+Of+RemoteLogMetadata+Messages+via+Compacted+Topic
> >> > > >> > > > > > > >
> >> > > >> > > > > > >
> >> > > >> > > > > > > Background:
> >> > > >> > > > > > > The current Tiered Storage implementation uses a
> >> > > >> > > > __remote_log_metadata
> >> > > >> > > > > > > topic with infinite retention and delete-based
> cleanup
> >> > > policy,
> >> > > >> > > > causing
> >> > > >> > > > > > > unbounded growth, slow broker bootstrap, no mechanism
> >> to
> >> > > >> clean up
> >> > > >> > > > > expired
> >> > > >> > > > > > > segment metadata, and inefficient re-reading from
> >> offset 0
> >> > > >> during
> >> > > >> > > > > > > leadership changes.
> >> > > >> > > > > > >
> >> > > >> > > > > > > Proposal:
> >> > > >> > > > > > > A dual-topic approach that introduces a new
> >> > > >> > > > > > __remote_log_metadata_compacted
> >> > > >> > > > > > > topic using log compaction with deterministic
> >> offset-based
> >> > > >> keys,
> >> > > >> > > > while
> >> > > >> > > > > > > preserving the existing topic for audit history; this
> >> > allows
> >> > > >> > > brokers
> >> > > >> > > > to
> >> > > >> > > > > > > build their metadata cache exclusively from the
> >> compacted
> >> > > >> topic,
> >> > > >> > > > > enables
> >> > > >> > > > > > > cleanup of expired segment metadata through
> tombstones,
> >> > and
> >> > > >> > > includes
> >> > > >> > > > a
> >> > > >> > > > > > > migration strategy to populate the new topic during
> >> > > >> > > > upgrade—delivering
> >> > > >> > > > > > > bounded metadata growth and faster broker startup
> while
> >> > > >> > maintaining
> >> > > >> > > > > > > backward compatibility.
> >> > > >> > > > > > >
> >> > > >> > > > > > > More details are in the attached KIP link.
> >> > > >> > > > > > > Looking forward to your thoughts.
> >> > > >> > > > > > >
> >> > > >> > > > > > > Thank you for your time!
> >> > > >> > > > > > >
> >> > > >> > > > > > > Best,
> >> > > >> > > > > > > Lijun Tong
> >> > > >> > > > > > >
> >> > > >> > > > > >
> >> > > >> > > > >
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > > >
> >> > >
> >> >
> >>
> >
>