Hi Kamal,

Thanks for taking another look at the KIP.
1. I have removed the left-over line about using another new topic from the
KIP.
2.

> 2. Assume that the topic is enabled with compaction and only one event is
> maintained per segment. If there is a transient error in the remote log
> deletion,
>     then the COPY_SEGMENT started / finished events might be compacted by
> the DELETE_SEGMENT_STARTED events. If the broker is restarted during
>     this time, will there be dangling remote log segments? Currently,
> during restart, the broker discards the events if it does not see the
> COPY_SEGMENT_STARTED      events.


I am glad you asked this question, I didn't mention this part in my current
design to avoid distractions from the main design, but I plan to add
another background thread to clean up all the stale messages by comparing
the message's endOffset with the topic partition's log start offset. I
believe this would help remove all the dangling messages.

Thanks,
Lijun TOng

Kamal Chandraprakash <[email protected]> 于2026年3月29日周日 22:48写道:

> Hi Lijun,
>
> Sorry for the late reply. Went over the KIP again. Overall LGTM. Few
> points:
>
> > This KIP aims to solve this issue through introducing another compacted
> topic for the brokers to bootstrap the state from
>
> 1. Shall we update the motivation section to mention that another topic is
> not introduced?
> 2. Assume that the topic is enabled with compaction and only one event is
> maintained per segment. If there is a transient error in the remote log
> deletion,
>     then the COPY_SEGMENT started / finished events might be compacted by
> the DELETE_SEGMENT_STARTED events. If the broker is restarted during
>     this time, will there be dangling remote log segments? Currently,
> during restart, the broker discards the events if it does not see the
> COPY_SEGMENT_STARTED      events.
>
> Thanks,
> Kamal
>
> On Thu, Mar 26, 2026 at 5:08 AM Lijun Tong <[email protected]>
> wrote:
>
> > Hi,
> >
> > I have started a Vote thread for this KIP, considering all questions
> raised
> > so far have been answered. I am happy to continue the discussion if
> needed,
> > otherwise, this is a friendly reminder on the vote for this KIP.
> >
> > Thanks,
> > Lijun Tong
> >
> >
> >
> > Lijun Tong <[email protected]> 于2026年1月19日周一 17:59写道:
> >
> > > Hey Kamal,
> > >
> > > Thanks for raising these questions. Here are my responses to your
> > > questions:
> > > Q1 and Q2:
> > > I think both questions boil down to how to release this new feature,
> both
> > > questions are valid concerns. The solution I have in mind is this
> feature
> > > is *gated by the metadata version*. The new tombstone semantics and the
> > > additional fields (for example in RemoteLogSegmentUpdateRecord) are
> only
> > > enabled once the cluster metadata version is upgraded to the version
> that
> > > introduces this feature. As long as the cluster metadata version is not
> > > bumped, the system will not produce tombstone records. Therefore,
> during
> > > rolling upgrades (mixed 4.2/4.3 brokers), the feature remains
> effectively
> > > disabled. Tombstones will only start being produced after the metadata
> > > version is upgraded, at which point all brokers are already required to
> > > support the new behavior.
> > >
> > > Since Kafka does not support metadata version downgrades at the moment,
> > > once a metadata version that supports this feature is enabled, it
> cannot
> > be
> > > downgraded to a version that does not support it. I will add these
> > details
> > > to the KIP later.
> > > Q3. This is an *editing mistake* in the KIP. Thanks for pointing it
> out —
> > > the enum value has already been corrected in the latest revision to
> > remove
> > > the unused placeholder and keep the state values contiguous and
> > consistent.
> > > Q4. I don't foresee the quota mechanism will interfere with the state
> > > transition in any way so far, let me know if any concern hits you.
> > >
> > > Thanks,
> > > Lijun
> > >
> > > Kamal Chandraprakash <[email protected]> 于2026年1月18日周日
> > > 00:40写道:
> > >
> > >> Hi Lijun,
> > >>
> > >> Thanks for updating the KIP!
> > >>
> > >> The updated migration plan looks clean to me. Few questions:
> > >>
> > >> 1. The ConsumerTask in 4.2 Kafka build does not handle the tombstone
> > >> records. Should the tombstone records be sent only when all the
> brokers
> > >> are
> > >> upgraded to 4.3 version?
> > >>
> > >> 2. Once all the brokers are upgraded and the __remote_log_metadata
> topic
> > >> cleanup policy changed to compact. Then, downgrading the brokers is
> not
> > >> allowed as the records without key will throw an error while producing
> > the
> > >> compacted topic. Shall we mention this in the compatibility section?
> > >>
> > >> 3. In the RemoteLogSegmentState Enum, why is the value 1 marked as
> > unused?
> > >>
> > >> 4. Regarding the key (TopicIdPartition:EndOffset:BrokerLeaderEpoch),
> we
> > >> may
> > >> have to check for scenarios where there is segment lag due to remote
> log
> > >> write quota. Will the state transition work correctly? Will come back
> to
> > >> this later.
> > >>
> > >> Thanks,
> > >> Kamal
> > >>
> > >> On Fri, Jan 16, 2026 at 4:50 AM jian fu <[email protected]> wrote:
> > >>
> > >> > Hi Lijun and Kamal
> > >> > I also think we don't need to keep delJIanpolicy in final config,if
> > >> so,we
> > >> > should always keep remembering all of our topic retention time must
> > less
> > >> > than the value,right?It is one protect with risk involved.
> > >> > Regards
> > >> > JIan
> > >> >
> > >> >
> > >> >
> > >> > Lijun Tong <[email protected]>于2026年1月16日 周五06:45写道:
> > >> >
> > >> > > Hey Kamal,
> > >> > >
> > >> > > Some additional points about the Q4,
> > >> > >
> > >> > > > The user can decide when to change their internal topic cleanup
> > >> policy
> > >> > to
> > >> > > > compact. If someone retains
> > >> > > > the data in the remote storage for 3 months, then they can
> migrate
> > >> to
> > >> > the
> > >> > > > compacted topic after 3 months
> > >> > > > post rolling out this change. And, update their cleanup policy
> to
> > >> > > [compact,
> > >> > > > delete].
> > >> > >
> > >> > >
> > >> > > I don't think it's a good idea to keep delete in the final cleanup
> > >> policy
> > >> > > for the topic `__remote_log_metadata`, as this still requires the
> > >> user to
> > >> > > keep track of the max retention hours of topics that have remote
> > >> storage
> > >> > > enabled, and it's operational burden. It's also hard to reason
> about
> > >> what
> > >> > > will happen if the user configures the wrong retention.ms. I hope
> > >> this
> > >> > > makes sense.
> > >> > >
> > >> > >
> > >> > > Thanks,
> > >> > > Lijun Tong
> > >> > >
> > >> > > Lijun Tong <[email protected]> 于2026年1月15日周四 11:43写道:
> > >> > >
> > >> > > > Hey Kamal,
> > >> > > >
> > >> > > > Thanks for your reply! I am glad we are on the same page with
> > making
> > >> > the
> > >> > > > __remote_log_metadata topic compacted optional for the user
> now, I
> > >> will
> > >> > > > update the KIP with this change.
> > >> > > >
> > >> > > > For the Q2:
> > >> > > > With the key designed as
> > >> TopicId:Partition:EndOffset:BrokerLeaderEpoch,
> > >> > > > even the same broker retries the upload multiple times for the
> > same
> > >> log
> > >> > > > segment, the latest retry attempt with the latest segment UUID
> > will
> > >> > > > overwrite the previous attempts' value since they share the same
> > >> key,
> > >> > so
> > >> > > we
> > >> > > > don't need to explicitly track the failed upload metadata,
> because
> > >> it's
> > >> > > > gone already by the later attempt. That's my understanding about
> > the
> > >> > > > RLMCopyTask, correct me if I am wrong.
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Lijun Tong
> > >> > > >
> > >> > > > Kamal Chandraprakash <[email protected]>
> > 于2026年1月14日周三
> > >> > > > 21:18写道:
> > >> > > >
> > >> > > >> Hi Lijun,
> > >> > > >>
> > >> > > >> Thanks for the reply!
> > >> > > >>
> > >> > > >> Q1: Sounds good. Could you clarify it in the KIP that the same
> > >> > > partitioner
> > >> > > >> will be used?
> > >> > > >>
> > >> > > >> Q2: With TopicId:Partition:EndOffset:BrokerLeaderEpoch key, if
> > the
> > >> > same
> > >> > > >> broker retries the upload due to intermittent
> > >> > > >> issues in object storage (or) RLMM. Then, those failed upload
> > >> metadata
> > >> > > >> also
> > >> > > >> need to be cleared.
> > >> > > >>
> > >> > > >> Q3: We may have to skip the null value records in the
> > ConsumerTask.
> > >> > > >>
> > >> > > >> Q4a: The idea is to keep the cleanup policy as "delete" and
> also
> > >> send
> > >> > > the
> > >> > > >> tombstone markers
> > >> > > >> to the existing `__remote_log_metadata` topic. And, handle the
> > >> > tombstone
> > >> > > >> records in the ConsumerTask.
> > >> > > >>
> > >> > > >> The user can decide when to change their internal topic cleanup
> > >> policy
> > >> > > to
> > >> > > >> compact. If someone retains
> > >> > > >> the data in the remote storage for 3 months, then they can
> > migrate
> > >> to
> > >> > > the
> > >> > > >> compacted topic after 3 months
> > >> > > >> post rolling out this change. And, update their cleanup policy
> to
> > >> > > >> [compact,
> > >> > > >> delete].
> > >> > > >>
> > >> > > >> Thanks,
> > >> > > >> Kamal
> > >> > > >>
> > >> > > >> On Thu, Jan 15, 2026 at 4:12 AM Lijun Tong <
> > >> [email protected]>
> > >> > > >> wrote:
> > >> > > >>
> > >> > > >> > Hey Jian,
> > >> > > >> >
> > >> > > >> > Thanks for your time to review this KIP. I appreciate that
> you
> > >> > > propose a
> > >> > > >> > simpler migration solution to onboard the new feature.
> > >> > > >> >
> > >> > > >> > There are 2 points that I think can be further refined on:
> > >> > > >> >
> > >> > > >> > 1). make the topic compacted optional, although the new
> feature
> > >> will
> > >> > > >> > continue to emit tombstone message for those expired log
> > segments
> > >> > even
> > >> > > >> when
> > >> > > >> > the topic is still on time-based retention mode, so once user
> > >> > switched
> > >> > > >> to
> > >> > > >> > using the compacted topic, those expired messages can still
> be
> > >> > deleted
> > >> > > >> > despite the topic is not retention based anymore.
> > >> > > >> > 2). we need to expose some flag to the user to indicate
> whether
> > >> the
> > >> > > >> topic
> > >> > > >> > can be flipped to compacted by checking whether all the old
> > >> format
> > >> > > >> > keyed-less message has expired, and allow user to choose to
> > flip
> > >> to
> > >> > > >> > compacted only when the flag is true.
> > >> > > >> >
> > >> > > >> > Thanks for sharing your idea. I will update the KIP later
> with
> > >> this
> > >> > > new
> > >> > > >> > idea.
> > >> > > >> >
> > >> > > >> > Best,
> > >> > > >> > Lijun Tong
> > >> > > >> >
> > >> > > >> >
> > >> > > >> > jian fu <[email protected]> 于2026年1月12日周一 04:55写道:
> > >> > > >> >
> > >> > > >> > > Hi  Lijun Tong:
> > >> > > >> > >
> > >> > > >> > > Thanks for your KIP which raise this critical issue.
> > >> > > >> > >
> > >> > > >> > > what about just keep one topic instead of involve another
> > >> topic.
> > >> > > >> > > for existed topic data's migration. maybe we can use this
> way
> > >> to
> > >> > > solve
> > >> > > >> > the
> > >> > > >> > > issue:
> > >> > > >> > > (1) set the retention date > all of topic which enable
> remote
> > >> > > >> storage's
> > >> > > >> > > retention time
> > >> > > >> > > (2) deploy new kafka version with feature:  which send the
> > >> message
> > >> > > >> with
> > >> > > >> > key
> > >> > > >> > > (3) wait all the message expired and new message with key
> > >> coming
> > >> > to
> > >> > > >> the
> > >> > > >> > > topic
> > >> > > >> > > (4) convert the topic to compact
> > >> > > >> > >
> > >> > > >> > > I don't test it. Just propose this solution according to
> code
> > >> > review
> > >> > > >> > > result.  just for your reference.
> > >> > > >> > > The steps maybe a little complex. but it can avoiding add
> new
> > >> > topic.
> > >> > > >> > >
> > >> > > >> > > Regards
> > >> > > >> > > Jian
> > >> > > >> > >
> > >> > > >> > > Lijun Tong <[email protected]> 于2026年1月8日周四 09:17写道:
> > >> > > >> > >
> > >> > > >> > > > Hey Kamal,
> > >> > > >> > > >
> > >> > > >> > > >
> > >> > > >> > > > Thanks for your time for the review.
> > >> > > >> > > >
> > >> > > >> > > >
> > >> > > >> > > > Here is my response to your questions:
> > >> > > >> > > >
> > >> > > >> > > > Q1 At this point, I don’t see a need to change
> > >> > > >> > > > RemoteLogMetadataTopicPartitioner for this design.
> Nothing
> > in
> > >> > the
> > >> > > >> > current
> > >> > > >> > > > approach appears to require a partitioner change, but I’m
> > >> open
> > >> > to
> > >> > > >> > > > revisiting if a concrete need arises.
> > >> > > >> > > >
> > >> > > >> > > > Q2 I have some reservations about using SegmentId:State
> as
> > >> the
> > >> > > key.
> > >> > > >> A
> > >> > > >> > > > practical challenge we see today is that the same logical
> > >> > segment
> > >> > > >> can
> > >> > > >> > be
> > >> > > >> > > > retried multiple times with different SegmentIds across
> > >> brokers.
> > >> > > If
> > >> > > >> the
> > >> > > >> > > key
> > >> > > >> > > > is SegmentId-based, it becomes harder to discover and
> > >> tombstone
> > >> > > all
> > >> > > >> > > related
> > >> > > >> > > > attempts when the segment eventually expires. The
> > >> > > >> > > > TopicId:Partition:EndOffset:BrokerLeaderEpoch key is
> > >> > deterministic
> > >> > > >> for
> > >> > > >> > a
> > >> > > >> > > > logical segment attempt and helps group retries by epoch,
> > >> which
> > >> > > >> > > simplifies
> > >> > > >> > > > cleanup and reasoning about state. I’d love to understand
> > the
> > >> > > >> benefits
> > >> > > >> > > > you’re seeing with SegmentId:State compared to the
> > >> > > >> offset/epoch-based
> > >> > > >> > key
> > >> > > >> > > > so we can weigh the trade-offs.
> > >> > > >> > > >
> > >> > > >> > > > On partitioning: with this proposal, all states for a
> given
> > >> user
> > >> > > >> > > > topic-partition still map to the same metadata partition.
> > >> That
> > >> > > >> remains
> > >> > > >> > > true
> > >> > > >> > > > for the existing __remote_log_metadata (unchanged
> > >> partitioner)
> > >> > and
> > >> > > >> for
> > >> > > >> > > the
> > >> > > >> > > > new __remote_log_metadata_compacted, preserving the
> > >> properties
> > >> > > >> > > > RemoteMetadataCache relies on.
> > >> > > >> > > >
> > >> > > >> > > > Q3 It should be fine for ConsumerTask to ignore tombstone
> > >> > records
> > >> > > >> (null
> > >> > > >> > > > values) and no-op.
> > >> > > >> > > >
> > >> > > >> > > > Q4 Although TBRLMM is a sample RLMM implementation, it’s
> > >> > currently
> > >> > > >> the
> > >> > > >> > > only
> > >> > > >> > > > OSS option and is widely used. The new
> > >> > > >> __remote_log_metadata_compacted
> > >> > > >> > > > topic offers clear operational benefits in that context.
> We
> > >> can
> > >> > > also
> > >> > > >> > > > provide a configuration to let users choose whether they
> > >> want to
> > >> > > >> keep
> > >> > > >> > the
> > >> > > >> > > > audit topic (__remote_log_metadata) in their cluster.
> > >> > > >> > > >
> > >> > > >> > > > Q4a Enabling compaction on __remote_log_metadata alone
> may
> > >> not
> > >> > > fully
> > >> > > >> > > > address the unbounded growth, since we also need to emit
> > >> > > tombstones
> > >> > > >> for
> > >> > > >> > > > expired keys to delete them. Deferring compaction and
> > >> > tombstoning
> > >> > > to
> > >> > > >> > user
> > >> > > >> > > > configuration could make the code flow complicated, also
> > add
> > >> > > >> > operational
> > >> > > >> > > > complexity and make outcomes less predictable. The
> proposal
> > >> aims
> > >> > > to
> > >> > > >> > > provide
> > >> > > >> > > > a consistent experience by defining deterministic keys
> and
> > >> > > emitting
> > >> > > >> > > > tombstones as part of the broker’s responsibilities,
> while
> > >> still
> > >> > > >> > allowing
> > >> > > >> > > > users to opt out of the audit topic if they prefer. But I
> > am
> > >> > open
> > >> > > to
> > >> > > >> > more
> > >> > > >> > > > discussion if there is any concrete need I don't foresee.
> > >> > > >> > > >
> > >> > > >> > > >
> > >> > > >> > > > Thanks,
> > >> > > >> > > >
> > >> > > >> > > > Lijun Tong
> > >> > > >> > > >
> > >> > > >> > > > Kamal Chandraprakash <[email protected]>
> > >> > > 于2026年1月6日周二
> > >> > > >> > > > 01:01写道:
> > >> > > >> > > >
> > >> > > >> > > > > Hi Lijun,
> > >> > > >> > > > >
> > >> > > >> > > > > Thanks for the KIP! Went over the first pass.
> > >> > > >> > > > >
> > >> > > >> > > > > Few Questions:
> > >> > > >> > > > >
> > >> > > >> > > > > 1. Are we going to maintain the same
> > >> > > >> > RemoteLogMetadataTopicPartitioner
> > >> > > >> > > > > <
> > >> > > >> > > > >
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/RemoteLogMetadataTopicPartitioner.java
> > >> > > >> > > > > >
> > >> > > >> > > > > for both the topics? It is not clear in the KIP, could
> > you
> > >> > > clarify
> > >> > > >> > it?
> > >> > > >> > > > > 2. Can the key be changed to SegmentId:State instead of
> > >> > > >> > > > > TopicId:Partition:EndOffset:BrokerLeaderEpoch if the
> same
> > >> > > >> partitioner
> > >> > > >> > > is
> > >> > > >> > > > > used? It is good to maintain all the segment states
> for a
> > >> > > >> > > > > user-topic-partition in the same metadata partition.
> > >> > > >> > > > > 3. Should we have to handle the records with null value
> > >> > > >> (tombstone)
> > >> > > >> > in
> > >> > > >> > > > the
> > >> > > >> > > > > ConsumerTask
> > >> > > >> > > > > <
> > >> > > >> > > > >
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/ConsumerTask.java?L166
> > >> > > >> > > > > >
> > >> > > >> > > > > ?
> > >> > > >> > > > > 4. TBRLMM
> > >> > > >> > > > > <
> > >> > > >> > > > >
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/TopicBasedRemoteLogMetadataManager.java
> > >> > > >> > > > > >
> > >> > > >> > > > > is a sample plugin implementation of RLMM. Not sure
> > whether
> > >> > the
> > >> > > >> > > community
> > >> > > >> > > > > will agree to add one more internal topic for this
> plugin
> > >> > impl.
> > >> > > >> > > > > 4a. Can we modify the new messages to the
> > >> > __remote_log_metadata
> > >> > > >> topic
> > >> > > >> > > to
> > >> > > >> > > > > contain the key and leave it to the user to enable
> > >> compaction
> > >> > > for
> > >> > > >> > this
> > >> > > >> > > > > topic if they need?
> > >> > > >> > > > >
> > >> > > >> > > > > Thanks,
> > >> > > >> > > > > Kamal
> > >> > > >> > > > >
> > >> > > >> > > > > On Tue, Jan 6, 2026 at 7:35 AM Lijun Tong <
> > >> > > >> [email protected]>
> > >> > > >> > > > wrote:
> > >> > > >> > > > >
> > >> > > >> > > > > > Hey Henry,
> > >> > > >> > > > > >
> > >> > > >> > > > > > Thank you for your time and response! I really like
> > your
> > >> > > >> KIP-1248
> > >> > > >> > > about
> > >> > > >> > > > > > offloading the consumption of remote log away from
> the
> > >> > broker,
> > >> > > >> and
> > >> > > >> > I
> > >> > > >> > > > > think
> > >> > > >> > > > > > with that change, the topic that enables the tiered
> > >> storage
> > >> > > can
> > >> > > >> > also
> > >> > > >> > > > have
> > >> > > >> > > > > > longer retention configurations and would benefit
> from
> > >> this
> > >> > > KIP
> > >> > > >> > too.
> > >> > > >> > > > > >
> > >> > > >> > > > > > Some suggestions: In your example scenarios, it would
> > >> also
> > >> > be
> > >> > > >> good
> > >> > > >> > to
> > >> > > >> > > > add
> > >> > > >> > > > > > > an example of remote log segment deletion triggered
> > by
> > >> > > >> retention
> > >> > > >> > > > policy
> > >> > > >> > > > > > > which will trigger generation of tombstone event
> into
> > >> > > metadata
> > >> > > >> > > topic
> > >> > > >> > > > > and
> > >> > > >> > > > > > > trigger log compaction/deletion 24 hour later, I
> > think
> > >> > this
> > >> > > is
> > >> > > >> > the
> > >> > > >> > > > key
> > >> > > >> > > > > > > event to cap the metadata topic size.
> > >> > > >> > > > > >
> > >> > > >> > > > > >
> > >> > > >> > > > > > Regarding to this suggestion, I am not sure whether
> > >> > Scenario 4
> > >> > > >> > > > > > <
> > >> > > >> > > > > >
> > >> > > >> > > > >
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406618613#KIP1266:BoundingTheNumberOfRemoteLogMetadataMessagesviaCompactedTopic-Scenario4:SegmentDeletion
> > >> > > >> > > > > > >
> > >> > > >> > > > > > has
> > >> > > >> > > > > > covered it. I can add more rows in the Timeline Table
> > >> like
> > >> > > >> > T5+24hour
> > >> > > >> > > to
> > >> > > >> > > > > > indicate the messages are gone by then to explicitly
> > show
> > >> > that
> > >> > > >> > > messages
> > >> > > >> > > > > are
> > >> > > >> > > > > > deleted, thus the number of messages are capped in
> the
> > >> > topic.
> > >> > > >> > > > > >
> > >> > > >> > > > > > Regarding whether the topic __remote_log_metadata is
> > >> still
> > >> > > >> > > necessary, I
> > >> > > >> > > > > am
> > >> > > >> > > > > > inclined to continue to have this topic at least for
> > >> > debugging
> > >> > > >> > > purposes
> > >> > > >> > > > > so
> > >> > > >> > > > > > we can build confidence about the compacted topic
> > >> change, we
> > >> > > can
> > >> > > >> > > > > > always choose to remove this topic in the future once
> > we
> > >> all
> > >> > > >> agree
> > >> > > >> > it
> > >> > > >> > > > > > provides limited value for the users.
> > >> > > >> > > > > >
> > >> > > >> > > > > > Thanks,
> > >> > > >> > > > > > Lijun Tong
> > >> > > >> > > > > >
> > >> > > >> > > > > >
> > >> > > >> > > > > > Henry Haiying Cai via dev <[email protected]>
> > >> > 于2026年1月5日周一
> > >> > > >> > > 16:19写道:
> > >> > > >> > > > > >
> > >> > > >> > > > > > > Lijun,
> > >> > > >> > > > > > >
> > >> > > >> > > > > > > Thanks for the proposal and I liked your idea of
> > using
> > >> a
> > >> > > >> > compacted
> > >> > > >> > > > > topic
> > >> > > >> > > > > > > for tiered storage metadata topic.
> > >> > > >> > > > > > >
> > >> > > >> > > > > > > In our setup, we have set a shorter retention (3
> > days)
> > >> for
> > >> > > the
> > >> > > >> > > tiered
> > >> > > >> > > > > > > storage metadata topic to control the size growth.
> > We
> > >> can
> > >> > > do
> > >> > > >> > that
> > >> > > >> > > > > since
> > >> > > >> > > > > > we
> > >> > > >> > > > > > > control all topic's retention policy in our
> clusters
> > >> and
> > >> > we
> > >> > > >> set a
> > >> > > >> > > > > uniform
> > >> > > >> > > > > > > retention.policy for all our tiered storage topics.
> > I
> > >> can
> > >> > > see
> > >> > > >> > > other
> > >> > > >> > > > > > > users/companies will not be able to enforce that
> > >> retention
> > >> > > >> policy
> > >> > > >> > > to
> > >> > > >> > > > > all
> > >> > > >> > > > > > > tiered storage topics.
> > >> > > >> > > > > > >
> > >> > > >> > > > > > > Some suggestions: In your example scenarios, it
> would
> > >> also
> > >> > > be
> > >> > > >> > good
> > >> > > >> > > to
> > >> > > >> > > > > add
> > >> > > >> > > > > > > an example of remote log segment deletion triggered
> > by
> > >> > > >> retention
> > >> > > >> > > > policy
> > >> > > >> > > > > > > which will trigger generation of tombstone event
> into
> > >> > > metadata
> > >> > > >> > > topic
> > >> > > >> > > > > and
> > >> > > >> > > > > > > trigger log compaction/deletion 24 hour later, I
> > think
> > >> > this
> > >> > > is
> > >> > > >> > the
> > >> > > >> > > > key
> > >> > > >> > > > > > > event to cap the metadata topic size.
> > >> > > >> > > > > > >
> > >> > > >> > > > > > > For the original unbounded remote_log_metadata
> topic,
> > >> I am
> > >> > > not
> > >> > > >> > sure
> > >> > > >> > > > > > > whether we still need it or not.  If it is left
> only
> > >> for
> > >> > > audit
> > >> > > >> > > trail
> > >> > > >> > > > > > > purpose, people can set up a data ingestion
> pipeline
> > to
> > >> > > ingest
> > >> > > >> > the
> > >> > > >> > > > > > content
> > >> > > >> > > > > > > of metadata topic into a separate storage location.
> > I
> > >> > think
> > >> > > >> we
> > >> > > >> > can
> > >> > > >> > > > > have
> > >> > > >> > > > > > a
> > >> > > >> > > > > > > flag to have only one metadata topic (the compacted
> > >> > > version).
> > >> > > >> > > > > > >
> > >> > > >> > > > > > >
> > >> > > >> > > > > > > On Monday, January 5, 2026 at 01:22:42 PM PST,
> Lijun
> > >> Tong
> > >> > <
> > >> > > >> > > > > > > [email protected]> wrote:
> > >> > > >> > > > > > >
> > >> > > >> > > > > > >
> > >> > > >> > > > > > >
> > >> > > >> > > > > > >
> > >> > > >> > > > > > >
> > >> > > >> > > > > > > Hello Kafka Community,
> > >> > > >> > > > > > >
> > >> > > >> > > > > > > I would like to start a discussion on KIP-1266,
> which
> > >> > > >> proposes to
> > >> > > >> > > add
> > >> > > >> > > > > > > another new compacted remote log metadata topic for
> > the
> > >> > > tiered
> > >> > > >> > > > storage,
> > >> > > >> > > > > > to
> > >> > > >> > > > > > > limit the number of messages that need to be
> iterated
> > >> to
> > >> > > build
> > >> > > >> > the
> > >> > > >> > > > > remote
> > >> > > >> > > > > > > metadata state.
> > >> > > >> > > > > > >
> > >> > > >> > > > > > > KIP link: KIP-1266 Bounding The Number Of
> > >> > RemoteLogMetadata
> > >> > > >> > > Messages
> > >> > > >> > > > > via
> > >> > > >> > > > > > > Compacted RemoteLogMetadata Topic
> > >> > > >> > > > > > > <
> > >> > > >> > > > > > >
> > >> > > >> > > > > >
> > >> > > >> > > > >
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1266%3A+Bounding+The+Number+Of+RemoteLogMetadata+Messages+via+Compacted+Topic
> > >> > > >> > > > > > > >
> > >> > > >> > > > > > >
> > >> > > >> > > > > > > Background:
> > >> > > >> > > > > > > The current Tiered Storage implementation uses a
> > >> > > >> > > > __remote_log_metadata
> > >> > > >> > > > > > > topic with infinite retention and delete-based
> > cleanup
> > >> > > policy,
> > >> > > >> > > > causing
> > >> > > >> > > > > > > unbounded growth, slow broker bootstrap, no
> mechanism
> > >> to
> > >> > > >> clean up
> > >> > > >> > > > > expired
> > >> > > >> > > > > > > segment metadata, and inefficient re-reading from
> > >> offset 0
> > >> > > >> during
> > >> > > >> > > > > > > leadership changes.
> > >> > > >> > > > > > >
> > >> > > >> > > > > > > Proposal:
> > >> > > >> > > > > > > A dual-topic approach that introduces a new
> > >> > > >> > > > > > __remote_log_metadata_compacted
> > >> > > >> > > > > > > topic using log compaction with deterministic
> > >> offset-based
> > >> > > >> keys,
> > >> > > >> > > > while
> > >> > > >> > > > > > > preserving the existing topic for audit history;
> this
> > >> > allows
> > >> > > >> > > brokers
> > >> > > >> > > > to
> > >> > > >> > > > > > > build their metadata cache exclusively from the
> > >> compacted
> > >> > > >> topic,
> > >> > > >> > > > > enables
> > >> > > >> > > > > > > cleanup of expired segment metadata through
> > tombstones,
> > >> > and
> > >> > > >> > > includes
> > >> > > >> > > > a
> > >> > > >> > > > > > > migration strategy to populate the new topic during
> > >> > > >> > > > upgrade—delivering
> > >> > > >> > > > > > > bounded metadata growth and faster broker startup
> > while
> > >> > > >> > maintaining
> > >> > > >> > > > > > > backward compatibility.
> > >> > > >> > > > > > >
> > >> > > >> > > > > > > More details are in the attached KIP link.
> > >> > > >> > > > > > > Looking forward to your thoughts.
> > >> > > >> > > > > > >
> > >> > > >> > > > > > > Thank you for your time!
> > >> > > >> > > > > > >
> > >> > > >> > > > > > > Best,
> > >> > > >> > > > > > > Lijun Tong
> > >> > > >> > > > > > >
> > >> > > >> > > > > >
> > >> > > >> > > > >
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>

Reply via email to