Hi Jiunn-Yang,

Would you mind removing the terms "hot" and "cold" when describing
partitions in the KIP? I understand you are using them to describe the
"freshness" or the users' need for the records, but applying these terms to
the partition itself feels a bit unnatural.

After all, in this scenario, users don't really care whether a partition is
newly expanded or not. Their only expectation is that they won't silently
lose any live records produced to the topic during their active consumption.

Best, Chia-Ping



黃竣陽 <[email protected]> 於 2026年5月30日週六 下午12:30寫道:

> Hello Jun,
>
> Thanks for the feedback, I have updated the KIP motivation section.
>
> Best Regards,
> Jiunn-Yang
>
> > Jun Rao via dev <[email protected]> 於 2026年5月30日 凌晨1:12 寫道:
> >
> > Hi, Jiunn-Yang,
> >
> > Thanks for the reply. I think we need a stronger motivation for the KIP.
> >
> > The KIP says "The core insight is that not all partitions without a
> > committed offset are the same. A newly expanded partition (hot) is
> > fundamentally different from a partition the consumer has never seen
> > because it predates the group (cold)." Why is the hot partition
> > fundamentally different from the cold?
> >
> > The KIP says "The existing by_duration policy is also insufficient
> because:
> >
> >   - The calculated seek time (now() - duration) varies across nodes due
> to
> >   clock skew. To be safe, users must set an overly large duration,
> causing
> >   unnecessary reprocessing.
> >   - On network errors, the client recalculates the seek time on retry,
> >   shifting the target timestamp forward and risking data loss."
> >
> > However, both of these situations are rare. If these issues persist, more
> > severe problems likely exist elsewhere. Rare situations don't need a
> common
> > solution. If users care about those rare situations, they can implement
> > customized logic using ConsumerRebalanceListener.onPartitionsAssigned().
> >
> > Jun
> >
> >
> > On Sun, May 17, 2026 at 6:50 AM 黃竣陽 <[email protected]> wrote:
> >
> >> Hello chia,
> >>
> >> Thanks for the feedback,
> >>
> >>> If the creation time exists, the returned value should always be
> greater
> >> than or equal to zero, right?
> >> I have explicitly mentioned this in the KIP.
> >>
> >>>> New  Old (MetadataResponse v0–13)    positive        any     field
> >> absent    UnsupportedVersionException
> >>
> >> The earliest point at which we can detect the version mismatch is during
> >> the
> >> first metadata fetch after assignment, which occurs inside poll().
> >> Therefore, the
> >> user would encounter an UnsupportedVersionException from poll(). I’ll
> >> clarify this in the KIP.
> >>
> >> Best Regards,
> >> Jiunn-Yang
> >>
> >>> Chia-Ping Tsai <[email protected]> 於 2026年5月17日 下午4:50 寫道:
> >>>
> >>> hi Jiunn
> >>>
> >>>> PartitionAgeMs (int64, default -1): The age of this partition in
> >> milliseconds, computed server-side by the broker as broker_current_time
> -
> >> partition_creation_time. Returns -1 if the broker does not support this
> >> feature or the partition creation time is unknown.
> >>>
> >>> If the creation time exists, the returned value should always be
> greater
> >> than or equal to zero, right?
> >>>
> >>>> New  Old (MetadataResponse v0–13)    positive        any     field
> >> absent    UnsupportedVersionException
> >>>
> >>> Will user encounter UnsupportedVersionException when calling `poll()`?
> >>>
> >>> Best,
> >>> Chia-Ping
> >>>
> >>>
> >>> On 2026/05/16 04:30:49 黃竣陽 wrote:
> >>>> Hello Jun, chia,
> >>>>
> >>>> I've updated KIP-1327 with a design change based on the discussion
> >>>> feedback.
> >>>>
> >>>> The updated design decouples the new-partition reset behavior from
> >>>> the base auto.offset.reset policy:
> >>>>
> >>>> - auto.offset.reset.max.age.ms now applies to all auto.offset.reset
> >> values
> >>>> (latest, earliest, by_duration, none).
> >>>> - For new ("hot") partitions, the consumer resets to
> >> auto.offset.reset.new.partitions
> >>>> config setting
> >>>> - For existing ("cold") partitions, the base auto.offset.reset policy
> >> continues
> >>>> to apply unchanged.
> >>>> - The new-partition reset behavior is represented by a separate
> >> internal config
> >>>> (auto.offset.reset.new.partitions, currently fixed to earliest). This
> >> decoupled design makes
> >>>> it straightforward to promote the behavior to a public user-facing
> >> configuration in a future KIP.
> >>>>
> >>>> Best Regards,
> >>>> Jiunn-Yang
> >>>>
> >>>>
> >>>>> Chia-Ping Tsai <[email protected]> 於 2026年5月16日 清晨7:46 寫道:
> >>>>>
> >>>>> hi Jun
> >>>>>
> >>>>> I see what you mean now. The proposal from me is listed below:
> >>>>>
> >>>>> 1) Add auto.offset.reset.new.partitions with a default value of
> >> earliest. It fixes the data loss from both by_duration and latest, and
> it
> >> does not change the logic of auto.offset.reset=earliest.
> >>>>> 2) Mark auto.offset.reset.new.partitions as an internal
> >> configuration. auto.offset.reset.new.partitions=earliest already
> >> addresses the issue, and we can discuss the use cases of other values
> in a
> >> separate KIP.
> >>>>> 3) Both configs, auto.offset.reset.new.partitions and
> >> auto.offset.reset.latest.max.age.ms, will be applied to all for
> >> consistency.
> >>>>>
> >>>>> WDYT?
> >>>>>
> >>>>> On 2026/05/15 20:53:20 Jun Rao via dev wrote:
> >>>>>> Hi, Chia-Ping,
> >>>>>>
> >>>>>> Thanks for the reply.
> >>>>>>
> >>>>>> 1. In the motivation section, the KIP says "When a Kafka topic is
> >> expanded
> >>>>>> with new partitions, consumers using the latest auto offset reset
> >> policy
> >>>>>> will silently miss all records produced to those partitions before
> the
> >>>>>> consumer discovers them.". If a user sets
> >>>>>> auto.offset.reset=by_duration=1sec, the same record loss issue could
> >> also
> >>>>>> happen, right?
> >>>>>>
> >>>>>> 2. I was thinking auto.offset.reset.new.partitions will take the
> same
> >>>>>> values as auto.offset.reset. So a user could set it by_duration if
> >> needed.
> >>>>>>
> >>>>>> Jun
> >>>>>>
> >>>>>> On Thu, May 14, 2026 at 4:06 PM Chia-Ping Tsai <[email protected]
> >
> >> wrote:
> >>>>>>
> >>>>>>> hi Jun
> >>>>>>>
> >>>>>>> Thanks for the feedback. I might be missing something important
> from
> >> your
> >>>>>>> suggestion, so please bear with me as I try to clarify with a few
> >> questions:
> >>>>>>>
> >>>>>>> 1. Is there a strong use case for extending this logic to other
> reset
> >>>>>>> policies? Unlike latest, policies like earliest or by_duration
> don't
> >> seem
> >>>>>>> to suffer from the same silent data loss issue when a partition is
> >> expanded.
> >>>>>>>
> >>>>>>> 2. What values would we expect users to configure for
> >>>>>>> auto.offset.reset.new.partitions? If they set it to earliest or
> >> latest,
> >>>>>>> we might run into the exact same edge cases. For example, if a
> >> consumer is
> >>>>>>> offline for a while and a new partition is created during that
> >> downtime,
> >>>>>>> the user might actually want to skip to latest when resuming,
> rather
> >> than
> >>>>>>> reading from earliest just because the partition is technically
> >> "new" to
> >>>>>>> the group.
> >>>>>>>
> >>>>>>> This is exactly why we opted for introducing a max.age threshold.
> It
> >> gives
> >>>>>>> users a time-bound way to define what is genuinely "hot/new" and
> >> what is
> >>>>>>> just an old partition they haven't seen yet.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Chia-Ping
> >>>>>>>
> >>>>>>> On 2026/05/14 20:48:09 Jun Rao via dev wrote:
> >>>>>>>> Hi, Jiunn-Yang,
> >>>>>>>>
> >>>>>>>> Thanks for the KIP.
> >>>>>>>>
> >>>>>>>> I find auto.offset.reset.latest.max.age a bit weird. It only
> >> applies when
> >>>>>>>> auto.offset.reset is latest. However, it seems that the motivation
> >>>>>>> equally
> >>>>>>>> applies when auto.offset.reset is set to other values like
> >> by_duration.
> >>>>>>> The
> >>>>>>>> intention is that we want to have a separate way to control newly
> >> created
> >>>>>>>> partitions vs existing partitions when the group starts. Have we
> >>>>>>> considered
> >>>>>>>> adding a new config like auto.offset.reset.new.partitions? If
> this
> >> new
> >>>>>>>> config is not set, the offset reset policy defaults to the policy
> >> used
> >>>>>>> for
> >>>>>>>> existing partitions. The user could set it explicitly to customize
> >> the
> >>>>>>>> behavior for new partitions.
> >>>>>>>>
> >>>>>>>> Jun
> >>>>>>>>
> >>>>>>>> On Thu, May 7, 2026 at 5:07 AM 黃竣陽 <[email protected]> wrote:
> >>>>>>>>
> >>>>>>>>> Hi all,
> >>>>>>>>>
> >>>>>>>>> I’d like to manually bump this thread.
> >>>>>>>>>
> >>>>>>>>> Best Regards,
> >>>>>>>>> Jiunn-Yang
> >>>>>>>>>
> >>>>>>>>>> 黃竣陽 <[email protected]> 於 2026年5月1日 晚上10:37 寫道:
> >>>>>>>>>>
> >>>>>>>>>> Hello all,
> >>>>>>>>>>
> >>>>>>>>>> Thanks for the feedback.
> >>>>>>>>>>
> >>>>>>>>>> DJ01/DJ02:
> >>>>>>>>>>
> >>>>>>>>>> MetadataResponse bumps from v13 to v14. The PartitionMetadata
> >> struct
> >>>>>>>>> gains a new
> >>>>>>>>>> field PartitionAgeMs (int64, default -1), computed server-side
> by
> >> the
> >>>>>>>>> broker as
> >>>>>>>>>> broker_current_time - partition_creation_time.
> >>>>>>>>>>
> >>>>>>>>>> Also add the consumer heartbeat flow. when MembershipManager
> >> detects
> >>>>>>> a
> >>>>>>>>> newly assigned
> >>>>>>>>>> partition, it explicitly invalidates the metadata for the
> affected
> >>>>>>> topic
> >>>>>>>>> and forces a fresh MetadataRequest
> >>>>>>>>>> before making the offset reset decision, even if the topic ID is
> >>>>>>> already
> >>>>>>>>> in the cache.
> >>>>>>>>>>
> >>>>>>>>>> MB0:
> >>>>>>>>>>
> >>>>>>>>>> The consumer learns the broker's maximum supported
> >> MetadataResponse
> >>>>>>>>> version via the
> >>>>>>>>>> ApiVersions negotiation at connection time. If the negotiated
> >>>>>>> version is
> >>>>>>>>> unsupported, the consumer
> >>>>>>>>>> knows the broker does not support PartitionAgeMs at all and can
> >>>>>>> throw an
> >>>>>>>>> UnsupportedVersionException
> >>>>>>>>>> immediately, rather than silently falling back to latest and
> >> risking
> >>>>>>>>> data loss without any operator-visible signal.
> >>>>>>>>>>
> >>>>>>>>>> MB1/MB2/MB3:
> >>>>>>>>>>
> >>>>>>>>>> I have addressed these changes in the KIP.
> >>>>>>>>>>
> >>>>>>>>>> Best Regards,
> >>>>>>>>>> Jiunn-Yang
> >>>>>>>>>>
> >>>>>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年4月29日 下午4:04 寫道:
> >>>>>>>>>>>
> >>>>>>>>>>> hi David
> >>>>>>>>>>>
> >>>>>>>>>>> I agree with the direction of moving the 'age' resolution from
> >> the
> >>>>>>>>> Heartbeat API to the Metadata API to keep the control plane
> clean.
> >> The
> >>>>>>> main
> >>>>>>>>> trade-off, as we noted before, is introducing inter-broker clock
> >> skew.
> >>>>>>> The
> >>>>>>>>> Group Coordinator approach provided a single source of truth for
> >> time.
> >>>>>>>>>>>
> >>>>>>>>>>> However, realistically, this time skew should be negligible.
> >> Given
> >>>>>>> that
> >>>>>>>>> the max.age threshold will likely be configured in minutes or
> >> hours, a
> >>>>>>>>> typical NTP skew (in milliseconds) between brokers won't impact
> the
> >>>>>>>>> fallback decision.
> >>>>>>>>>>>
> >>>>>>>>>>> Best,
> >>>>>>>>>>> Chia-Ping
> >>>>>>>>>>>
> >>>>>>>>>>>> David Jacot via dev <[email protected]> 於 2026年4月29日
> 下午3:29
> >> 寫道:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks for the KIP!
> >>>>>>>>>>>>
> >>>>>>>>>>>> Sorry, I haven't really followed the previous conversation
> but I
> >>>>>>> took a
> >>>>>>>>>>>> quick look at this one.
> >>>>>>>>>>>>
> >>>>>>>>>>>> DJ01: I don't clearly understand the flow with the
> >>>>>>>>> ConsumerGroupHeartbeat
> >>>>>>>>>>>> API after reading the KIP. There is a new boolean; the KIP
> >> states
> >>>>>>> that
> >>>>>>>>>>>> partition ages are returned only when this boolean is set.
> >>>>>>> Implicitly,
> >>>>>>>>> this
> >>>>>>>>>>>> means that when the consumer receives a new partition, it will
> >>>>>>> issue a
> >>>>>>>>> new
> >>>>>>>>>>>> HB request with the boolean set to receive the ages. Is my
> >>>>>>>>> understanding
> >>>>>>>>>>>> correct? We should perhaps clarify the flow and also explain
> >> how it
> >>>>>>>>> fits
> >>>>>>>>>>>> into the existing flow (e.g. list offsets, fetch offsets,
> etc.).
> >>>>>>>>>>>> DJ02: It my understanding is correct, I wonder if
> >>>>>>>>>>>> the ConsumerGroupHeartbeat API is the right place for this
> given
> >>>>>>> that
> >>>>>>>>> a new
> >>>>>>>>>>>> round trip is done anyway. Alternatively, it could simply
> >> include
> >>>>>>> the
> >>>>>>>>>>>> metadata. Generally, we should be rather cautious about not
> >>>>>>> overloading
> >>>>>>>>>>>> the ConsumerGroupHeartbeat API with unrelated concepts. The
> API
> >> is
> >>>>>>> a
> >>>>>>>>>>>> control plane API for assigning or revoking partitions. The
> fact
> >>>>>>> that
> >>>>>>>>> we
> >>>>>>>>>>>> don't want to add it to the corresponding Streams API also
> >> suggests
> >>>>>>>>>>>> something is not quite right. What would we do if we want to
> >>>>>>> support
> >>>>>>>>>>>> Streams in the future?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best,
> >>>>>>>>>>>> David
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Wed, Apr 29, 2026 at 12:28 AM Muralidhar Basani via dev <
> >>>>>>>>>>>>> [email protected]> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Hi Jiunn,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thank you for this great kip. Good to know about the gap.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> mb-0 - why a new v2 version bump for RequestPartitionAges
> >> field.
> >>>>>>> Can a
> >>>>>>>>>>>>> tagged field (for ex: on response, PartitionAges on
> >>>>>>> TopicPartitions)
> >>>>>>>>> be
> >>>>>>>>>>>>> used here and avoid version bump?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> mb-1 - For the new config, is there a recommended value or a
> >>>>>>> ConfigDef
> >>>>>>>>>>>>> validator? Probably it should based on the
> metadata.max.age.ms
> >> ?
> >>>>>>>>> Sizing
> >>>>>>>>>>>>> instructions can be part of javadocs I guess.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> mb-2 - (minor) As there are no changes to Kafka Streams,
> would
> >> it
> >>>>>>> be
> >>>>>>>>> better
> >>>>>>>>>>>>> to add this new config auto.offset.reset.latest.max.age to
> the
> >>>>>>>>>>>>> StreamsConfig block list
> >>>>>>> (NON_CONFIGURABLE_CONSUMER_DEFAULT_CONFIGS)
> >>>>>>>>> for a
> >>>>>>>>>>>>> clear warning, incase users configure it? This is the most
> >>>>>>> familiar
> >>>>>>>>>>>>> consumer config and users might easily mistakenly configure
> >> it. Or
> >>>>>>>>> may be
> >>>>>>>>>>>>> it's not worth it to add.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> mb-3 - (minor) The phrasing "the consumer falls back to
> >> earliest"
> >>>>>>>>> reads as
> >>>>>>>>>>>>> if the config were being changed per-partition which isn't
> >>>>>>> supported.
> >>>>>>>>> May
> >>>>>>>>>>>>> be rephrasing to something like "consumer resolves the
> initial
> >>>>>>>>> position to
> >>>>>>>>>>>>> start offset for that partition" as if earliest was applied
> to
> >>>>>>> that
> >>>>>>>>>>>>> partition only and auto.offset.reset config is unchanged.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> Murali
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Tue, Apr 28, 2026 at 2:48 PM 黃竣陽 <[email protected]>
> >> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi chia,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I have updated the KIP to include this change.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best Regards,
> >>>>>>>>>>>>>> Jiunn-Yang
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年4月28日 晚上8:03
> 寫道:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> hi Jiunn-Yang
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> chia_0: Should we expose the partition creation time via
> the
> >>>>>>> Admin
> >>>>>>>>> API?
> >>>>>>>>>>>>>> I assume it would be valuable for users to diagnose and
> >>>>>>> troubleshoot
> >>>>>>>>> the
> >>>>>>>>>>>>>> behavior of auto.offset.reset.latest.max.age
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>> Chia-Ping
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On 2026/04/28 10:47:58 黃竣陽 wrote:
> >>>>>>>>>>>>>>>> Hello everyone,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I would like to start a discussion on KIP-1327 Prevent Hot
> >> Data
> >>>>>>>>> Loss
> >>>>>>>>>>>>> on
> >>>>>>>>>>>>>> Partition Expansion for Latest Policy
> >>>>>>>>>>>>>>>> <
> >>>>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/x/KY4mGQ__;!!Ayb5sqE7!qF4q1QzF1RRgP61D7A2xuEai1ky7fepKDKFFvpNBuePikH-ULmT87TvuuZzy5kau5E4y5zMZAmfQQiwZomM$
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> This proposal aims to introduces
> >>>>>>> auto.offset.reset.latest.max.age,
> >>>>>>>>> a
> >>>>>>>>>>>>>> consumer config that lets the
> >>>>>>>>>>>>>>>> latest reset policy distinguish newly expanded (hot)
> >> partitions
> >>>>>>>>> from
> >>>>>>>>>>>>>> long-existing (cold) ones. Partitions
> >>>>>>>>>>>>>>>> younger than the configured threshold automatically fall
> >> back
> >>>>>>> to
> >>>>>>>>>>>>>> earliest, preventing silent data loss
> >>>>>>>>>>>>>>>> during topic expansion without forcing a full historical
> >>>>>>> reprocess.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>> Jiunn-Yang
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Reply via email to