Hello Jun, chia,

I've updated KIP-1327 with a design change based on the discussion 
feedback. 

The updated design decouples the new-partition reset behavior from 
the base auto.offset.reset policy: 

- auto.offset.reset.max.age.ms now applies to all auto.offset.reset values 
(latest, earliest, by_duration, none). 
- For new ("hot") partitions, the consumer resets to 
auto.offset.reset.new.partitions 
config setting
- For existing ("cold") partitions, the base auto.offset.reset policy continues 
to apply unchanged. 
- The new-partition reset behavior is represented by a separate internal config 
(auto.offset.reset.new.partitions, currently fixed to earliest). This decoupled 
design makes 
it straightforward to promote the behavior to a public user-facing 
configuration in a future KIP.

Best Regards,
Jiunn-Yang


> Chia-Ping Tsai <[email protected]> 於 2026年5月16日 清晨7:46 寫道:
> 
> hi Jun
> 
> I see what you mean now. The proposal from me is listed below:
> 
> 1) Add auto.offset.reset.new.partitions with a default value of earliest. It 
> fixes the data loss from both by_duration and latest, and it does not change 
> the logic of auto.offset.reset=earliest.
> 2) Mark auto.offset.reset.new.partitions as an internal configuration. 
> auto.offset.reset.new.partitions=earliest already addresses the issue, and we 
> can discuss the use cases of other values in a separate KIP.
> 3) Both configs, auto.offset.reset.new.partitions and 
> auto.offset.reset.latest.max.age.ms, will be applied to all for consistency.
> 
> WDYT?
> 
> On 2026/05/15 20:53:20 Jun Rao via dev wrote:
>> Hi, Chia-Ping,
>> 
>> Thanks for the reply.
>> 
>> 1. In the motivation section, the KIP says "When a Kafka topic is expanded
>> with new partitions, consumers using the latest auto offset reset policy
>> will silently miss all records produced to those partitions before the
>> consumer discovers them.". If a user sets
>> auto.offset.reset=by_duration=1sec, the same record loss issue could also
>> happen, right?
>> 
>> 2. I was thinking auto.offset.reset.new.partitions will take the same
>> values as auto.offset.reset. So a user could set it by_duration if needed.
>> 
>> Jun
>> 
>> On Thu, May 14, 2026 at 4:06 PM Chia-Ping Tsai <[email protected]> wrote:
>> 
>>> hi Jun
>>> 
>>> Thanks for the feedback. I might be missing something important from your
>>> suggestion, so please bear with me as I try to clarify with a few questions:
>>> 
>>> 1. Is there a strong use case for extending this logic to other reset
>>> policies? Unlike latest, policies like earliest or by_duration don't seem
>>> to suffer from the same silent data loss issue when a partition is expanded.
>>> 
>>> 2. What values would we expect users to configure for
>>> auto.offset.reset.new.partitions? If they set it to earliest or latest,
>>> we might run into the exact same edge cases. For example, if a consumer is
>>> offline for a while and a new partition is created during that downtime,
>>> the user might actually want to skip to latest when resuming, rather than
>>> reading from earliest just because the partition is technically "new" to
>>> the group.
>>> 
>>> This is exactly why we opted for introducing a max.age threshold. It gives
>>> users a time-bound way to define what is genuinely "hot/new" and what is
>>> just an old partition they haven't seen yet.
>>> 
>>> Best,
>>> Chia-Ping
>>> 
>>> On 2026/05/14 20:48:09 Jun Rao via dev wrote:
>>>> Hi, Jiunn-Yang,
>>>> 
>>>> Thanks for the KIP.
>>>> 
>>>> I find auto.offset.reset.latest.max.age a bit weird. It only applies when
>>>> auto.offset.reset is latest. However, it seems that the motivation
>>> equally
>>>> applies when auto.offset.reset is set to other values like by_duration.
>>> The
>>>> intention is that we want to have a separate way to control newly created
>>>> partitions vs existing partitions when the group starts. Have we
>>> considered
>>>> adding a new config like auto.offset.reset.new.partitions? If this new
>>>> config is not set, the offset reset policy defaults to the policy used
>>> for
>>>> existing partitions. The user could set it explicitly to customize the
>>>> behavior for new partitions.
>>>> 
>>>> Jun
>>>> 
>>>> On Thu, May 7, 2026 at 5:07 AM 黃竣陽 <[email protected]> wrote:
>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I’d like to manually bump this thread.
>>>>> 
>>>>> Best Regards,
>>>>> Jiunn-Yang
>>>>> 
>>>>>> 黃竣陽 <[email protected]> 於 2026年5月1日 晚上10:37 寫道:
>>>>>> 
>>>>>> Hello all,
>>>>>> 
>>>>>> Thanks for the feedback.
>>>>>> 
>>>>>> DJ01/DJ02:
>>>>>> 
>>>>>> MetadataResponse bumps from v13 to v14. The PartitionMetadata struct
>>>>> gains a new
>>>>>> field PartitionAgeMs (int64, default -1), computed server-side by the
>>>>> broker as
>>>>>> broker_current_time - partition_creation_time.
>>>>>> 
>>>>>> Also add the consumer heartbeat flow. when MembershipManager detects
>>> a
>>>>> newly assigned
>>>>>> partition, it explicitly invalidates the metadata for the affected
>>> topic
>>>>> and forces a fresh MetadataRequest
>>>>>> before making the offset reset decision, even if the topic ID is
>>> already
>>>>> in the cache.
>>>>>> 
>>>>>> MB0:
>>>>>> 
>>>>>> The consumer learns the broker's maximum supported MetadataResponse
>>>>> version via the
>>>>>> ApiVersions negotiation at connection time. If the negotiated
>>> version is
>>>>> unsupported, the consumer
>>>>>> knows the broker does not support PartitionAgeMs at all and can
>>> throw an
>>>>> UnsupportedVersionException
>>>>>> immediately, rather than silently falling back to latest and risking
>>>>> data loss without any operator-visible signal.
>>>>>> 
>>>>>> MB1/MB2/MB3:
>>>>>> 
>>>>>> I have addressed these changes in the KIP.
>>>>>> 
>>>>>> Best Regards,
>>>>>> Jiunn-Yang
>>>>>> 
>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年4月29日 下午4:04 寫道:
>>>>>>> 
>>>>>>> hi David
>>>>>>> 
>>>>>>> I agree with the direction of moving the 'age' resolution from the
>>>>> Heartbeat API to the Metadata API to keep the control plane clean. The
>>> main
>>>>> trade-off, as we noted before, is introducing inter-broker clock skew.
>>> The
>>>>> Group Coordinator approach provided a single source of truth for time.
>>>>>>> 
>>>>>>> However, realistically, this time skew should be negligible. Given
>>> that
>>>>> the max.age threshold will likely be configured in minutes or hours, a
>>>>> typical NTP skew (in milliseconds) between brokers won't impact the
>>>>> fallback decision.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Chia-Ping
>>>>>>> 
>>>>>>>> David Jacot via dev <[email protected]> 於 2026年4月29日 下午3:29 寫道:
>>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> Thanks for the KIP!
>>>>>>>> 
>>>>>>>> Sorry, I haven't really followed the previous conversation but I
>>> took a
>>>>>>>> quick look at this one.
>>>>>>>> 
>>>>>>>> DJ01: I don't clearly understand the flow with the
>>>>> ConsumerGroupHeartbeat
>>>>>>>> API after reading the KIP. There is a new boolean; the KIP states
>>> that
>>>>>>>> partition ages are returned only when this boolean is set.
>>> Implicitly,
>>>>> this
>>>>>>>> means that when the consumer receives a new partition, it will
>>> issue a
>>>>> new
>>>>>>>> HB request with the boolean set to receive the ages. Is my
>>>>> understanding
>>>>>>>> correct? We should perhaps clarify the flow and also explain how it
>>>>> fits
>>>>>>>> into the existing flow (e.g. list offsets, fetch offsets, etc.).
>>>>>>>> DJ02: It my understanding is correct, I wonder if
>>>>>>>> the ConsumerGroupHeartbeat API is the right place for this given
>>> that
>>>>> a new
>>>>>>>> round trip is done anyway. Alternatively, it could simply include
>>> the
>>>>>>>> metadata. Generally, we should be rather cautious about not
>>> overloading
>>>>>>>> the ConsumerGroupHeartbeat API with unrelated concepts. The API is
>>> a
>>>>>>>> control plane API for assigning or revoking partitions. The fact
>>> that
>>>>> we
>>>>>>>> don't want to add it to the corresponding Streams API also suggests
>>>>>>>> something is not quite right. What would we do if we want to
>>> support
>>>>>>>> Streams in the future?
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> David
>>>>>>>> 
>>>>>>>>> On Wed, Apr 29, 2026 at 12:28 AM Muralidhar Basani via dev <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>> 
>>>>>>>>> Hi Jiunn,
>>>>>>>>> 
>>>>>>>>> Thank you for this great kip. Good to know about the gap.
>>>>>>>>> 
>>>>>>>>> mb-0 - why a new v2 version bump for RequestPartitionAges field.
>>> Can a
>>>>>>>>> tagged field (for ex: on response, PartitionAges on
>>> TopicPartitions)
>>>>> be
>>>>>>>>> used here and avoid version bump?
>>>>>>>>> 
>>>>>>>>> mb-1 - For the new config, is there a recommended value or a
>>> ConfigDef
>>>>>>>>> validator? Probably it should based on the metadata.max.age.ms ?
>>>>> Sizing
>>>>>>>>> instructions can be part of javadocs I guess.
>>>>>>>>> 
>>>>>>>>> mb-2 - (minor) As there are no changes to Kafka Streams, would it
>>> be
>>>>> better
>>>>>>>>> to add this new config auto.offset.reset.latest.max.age to the
>>>>>>>>> StreamsConfig block list
>>> (NON_CONFIGURABLE_CONSUMER_DEFAULT_CONFIGS)
>>>>> for a
>>>>>>>>> clear warning, incase users configure it? This is the most
>>> familiar
>>>>>>>>> consumer config and users might easily mistakenly configure it. Or
>>>>> may be
>>>>>>>>> it's not worth it to add.
>>>>>>>>> 
>>>>>>>>> mb-3 - (minor) The phrasing "the consumer falls back to earliest"
>>>>> reads as
>>>>>>>>> if the config were being changed per-partition which isn't
>>> supported.
>>>>> May
>>>>>>>>> be rephrasing to something like "consumer resolves the initial
>>>>> position to
>>>>>>>>> start offset for that partition" as if earliest was applied to
>>> that
>>>>>>>>> partition only and auto.offset.reset config is unchanged.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Murali
>>>>>>>>> 
>>>>>>>>>> On Tue, Apr 28, 2026 at 2:48 PM 黃竣陽 <[email protected]> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi chia,
>>>>>>>>>> 
>>>>>>>>>> I have updated the KIP to include this change.
>>>>>>>>>> 
>>>>>>>>>> Best Regards,
>>>>>>>>>> Jiunn-Yang
>>>>>>>>>> 
>>>>>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年4月28日 晚上8:03 寫道:
>>>>>>>>>>> 
>>>>>>>>>>> hi Jiunn-Yang
>>>>>>>>>>> 
>>>>>>>>>>> chia_0: Should we expose the partition creation time via the
>>> Admin
>>>>> API?
>>>>>>>>>> I assume it would be valuable for users to diagnose and
>>> troubleshoot
>>>>> the
>>>>>>>>>> behavior of auto.offset.reset.latest.max.age
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Chia-Ping
>>>>>>>>>>> 
>>>>>>>>>>> On 2026/04/28 10:47:58 黃竣陽 wrote:
>>>>>>>>>>>> Hello everyone,
>>>>>>>>>>>> 
>>>>>>>>>>>> I would like to start a discussion on KIP-1327 Prevent Hot Data
>>>>> Loss
>>>>>>>>> on
>>>>>>>>>> Partition Expansion for Latest Policy
>>>>>>>>>>>> <
>>>>>>>>> 
>>>>> 
>>> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/x/KY4mGQ__;!!Ayb5sqE7!qF4q1QzF1RRgP61D7A2xuEai1ky7fepKDKFFvpNBuePikH-ULmT87TvuuZzy5kau5E4y5zMZAmfQQiwZomM$
>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> This proposal aims to introduces
>>> auto.offset.reset.latest.max.age,
>>>>> a
>>>>>>>>>> consumer config that lets the
>>>>>>>>>>>> latest reset policy distinguish newly expanded (hot) partitions
>>>>> from
>>>>>>>>>> long-existing (cold) ones. Partitions
>>>>>>>>>>>> younger than the configured threshold automatically fall back
>>> to
>>>>>>>>>> earliest, preventing silent data loss
>>>>>>>>>>>> during topic expansion without forcing a full historical
>>> reprocess.
>>>>>>>>>>>> 
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> Jiunn-Yang
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>> 

Reply via email to