Hi Jiunn-Yang, Would you mind removing the terms "hot" and "cold" when describing partitions in the KIP? I understand you are using them to describe the "freshness" or the users' need for the records, but applying these terms to the partition itself feels a bit unnatural.
After all, in this scenario, users don't really care whether a partition is newly expanded or not. Their only expectation is that they won't silently lose any live records produced to the topic during their active consumption. Best, Chia-Ping 黃竣陽 <[email protected]> 於 2026年5月30日週六 下午12:30寫道: > Hello Jun, > > Thanks for the feedback, I have updated the KIP motivation section. > > Best Regards, > Jiunn-Yang > > > Jun Rao via dev <[email protected]> 於 2026年5月30日 凌晨1:12 寫道: > > > > Hi, Jiunn-Yang, > > > > Thanks for the reply. I think we need a stronger motivation for the KIP. > > > > The KIP says "The core insight is that not all partitions without a > > committed offset are the same. A newly expanded partition (hot) is > > fundamentally different from a partition the consumer has never seen > > because it predates the group (cold)." Why is the hot partition > > fundamentally different from the cold? > > > > The KIP says "The existing by_duration policy is also insufficient > because: > > > > - The calculated seek time (now() - duration) varies across nodes due > to > > clock skew. To be safe, users must set an overly large duration, > causing > > unnecessary reprocessing. > > - On network errors, the client recalculates the seek time on retry, > > shifting the target timestamp forward and risking data loss." > > > > However, both of these situations are rare. If these issues persist, more > > severe problems likely exist elsewhere. Rare situations don't need a > common > > solution. If users care about those rare situations, they can implement > > customized logic using ConsumerRebalanceListener.onPartitionsAssigned(). > > > > Jun > > > > > > On Sun, May 17, 2026 at 6:50 AM 黃竣陽 <[email protected]> wrote: > > > >> Hello chia, > >> > >> Thanks for the feedback, > >> > >>> If the creation time exists, the returned value should always be > greater > >> than or equal to zero, right? > >> I have explicitly mentioned this in the KIP. > >> > >>>> New Old (MetadataResponse v0–13) positive any field > >> absent UnsupportedVersionException > >> > >> The earliest point at which we can detect the version mismatch is during > >> the > >> first metadata fetch after assignment, which occurs inside poll(). > >> Therefore, the > >> user would encounter an UnsupportedVersionException from poll(). I’ll > >> clarify this in the KIP. > >> > >> Best Regards, > >> Jiunn-Yang > >> > >>> Chia-Ping Tsai <[email protected]> 於 2026年5月17日 下午4:50 寫道: > >>> > >>> hi Jiunn > >>> > >>>> PartitionAgeMs (int64, default -1): The age of this partition in > >> milliseconds, computed server-side by the broker as broker_current_time > - > >> partition_creation_time. Returns -1 if the broker does not support this > >> feature or the partition creation time is unknown. > >>> > >>> If the creation time exists, the returned value should always be > greater > >> than or equal to zero, right? > >>> > >>>> New Old (MetadataResponse v0–13) positive any field > >> absent UnsupportedVersionException > >>> > >>> Will user encounter UnsupportedVersionException when calling `poll()`? > >>> > >>> Best, > >>> Chia-Ping > >>> > >>> > >>> On 2026/05/16 04:30:49 黃竣陽 wrote: > >>>> Hello Jun, chia, > >>>> > >>>> I've updated KIP-1327 with a design change based on the discussion > >>>> feedback. > >>>> > >>>> The updated design decouples the new-partition reset behavior from > >>>> the base auto.offset.reset policy: > >>>> > >>>> - auto.offset.reset.max.age.ms now applies to all auto.offset.reset > >> values > >>>> (latest, earliest, by_duration, none). > >>>> - For new ("hot") partitions, the consumer resets to > >> auto.offset.reset.new.partitions > >>>> config setting > >>>> - For existing ("cold") partitions, the base auto.offset.reset policy > >> continues > >>>> to apply unchanged. > >>>> - The new-partition reset behavior is represented by a separate > >> internal config > >>>> (auto.offset.reset.new.partitions, currently fixed to earliest). This > >> decoupled design makes > >>>> it straightforward to promote the behavior to a public user-facing > >> configuration in a future KIP. > >>>> > >>>> Best Regards, > >>>> Jiunn-Yang > >>>> > >>>> > >>>>> Chia-Ping Tsai <[email protected]> 於 2026年5月16日 清晨7:46 寫道: > >>>>> > >>>>> hi Jun > >>>>> > >>>>> I see what you mean now. The proposal from me is listed below: > >>>>> > >>>>> 1) Add auto.offset.reset.new.partitions with a default value of > >> earliest. It fixes the data loss from both by_duration and latest, and > it > >> does not change the logic of auto.offset.reset=earliest. > >>>>> 2) Mark auto.offset.reset.new.partitions as an internal > >> configuration. auto.offset.reset.new.partitions=earliest already > >> addresses the issue, and we can discuss the use cases of other values > in a > >> separate KIP. > >>>>> 3) Both configs, auto.offset.reset.new.partitions and > >> auto.offset.reset.latest.max.age.ms, will be applied to all for > >> consistency. > >>>>> > >>>>> WDYT? > >>>>> > >>>>> On 2026/05/15 20:53:20 Jun Rao via dev wrote: > >>>>>> Hi, Chia-Ping, > >>>>>> > >>>>>> Thanks for the reply. > >>>>>> > >>>>>> 1. In the motivation section, the KIP says "When a Kafka topic is > >> expanded > >>>>>> with new partitions, consumers using the latest auto offset reset > >> policy > >>>>>> will silently miss all records produced to those partitions before > the > >>>>>> consumer discovers them.". If a user sets > >>>>>> auto.offset.reset=by_duration=1sec, the same record loss issue could > >> also > >>>>>> happen, right? > >>>>>> > >>>>>> 2. I was thinking auto.offset.reset.new.partitions will take the > same > >>>>>> values as auto.offset.reset. So a user could set it by_duration if > >> needed. > >>>>>> > >>>>>> Jun > >>>>>> > >>>>>> On Thu, May 14, 2026 at 4:06 PM Chia-Ping Tsai <[email protected] > > > >> wrote: > >>>>>> > >>>>>>> hi Jun > >>>>>>> > >>>>>>> Thanks for the feedback. I might be missing something important > from > >> your > >>>>>>> suggestion, so please bear with me as I try to clarify with a few > >> questions: > >>>>>>> > >>>>>>> 1. Is there a strong use case for extending this logic to other > reset > >>>>>>> policies? Unlike latest, policies like earliest or by_duration > don't > >> seem > >>>>>>> to suffer from the same silent data loss issue when a partition is > >> expanded. > >>>>>>> > >>>>>>> 2. What values would we expect users to configure for > >>>>>>> auto.offset.reset.new.partitions? If they set it to earliest or > >> latest, > >>>>>>> we might run into the exact same edge cases. For example, if a > >> consumer is > >>>>>>> offline for a while and a new partition is created during that > >> downtime, > >>>>>>> the user might actually want to skip to latest when resuming, > rather > >> than > >>>>>>> reading from earliest just because the partition is technically > >> "new" to > >>>>>>> the group. > >>>>>>> > >>>>>>> This is exactly why we opted for introducing a max.age threshold. > It > >> gives > >>>>>>> users a time-bound way to define what is genuinely "hot/new" and > >> what is > >>>>>>> just an old partition they haven't seen yet. > >>>>>>> > >>>>>>> Best, > >>>>>>> Chia-Ping > >>>>>>> > >>>>>>> On 2026/05/14 20:48:09 Jun Rao via dev wrote: > >>>>>>>> Hi, Jiunn-Yang, > >>>>>>>> > >>>>>>>> Thanks for the KIP. > >>>>>>>> > >>>>>>>> I find auto.offset.reset.latest.max.age a bit weird. It only > >> applies when > >>>>>>>> auto.offset.reset is latest. However, it seems that the motivation > >>>>>>> equally > >>>>>>>> applies when auto.offset.reset is set to other values like > >> by_duration. > >>>>>>> The > >>>>>>>> intention is that we want to have a separate way to control newly > >> created > >>>>>>>> partitions vs existing partitions when the group starts. Have we > >>>>>>> considered > >>>>>>>> adding a new config like auto.offset.reset.new.partitions? If > this > >> new > >>>>>>>> config is not set, the offset reset policy defaults to the policy > >> used > >>>>>>> for > >>>>>>>> existing partitions. The user could set it explicitly to customize > >> the > >>>>>>>> behavior for new partitions. > >>>>>>>> > >>>>>>>> Jun > >>>>>>>> > >>>>>>>> On Thu, May 7, 2026 at 5:07 AM 黃竣陽 <[email protected]> wrote: > >>>>>>>> > >>>>>>>>> Hi all, > >>>>>>>>> > >>>>>>>>> I’d like to manually bump this thread. > >>>>>>>>> > >>>>>>>>> Best Regards, > >>>>>>>>> Jiunn-Yang > >>>>>>>>> > >>>>>>>>>> 黃竣陽 <[email protected]> 於 2026年5月1日 晚上10:37 寫道: > >>>>>>>>>> > >>>>>>>>>> Hello all, > >>>>>>>>>> > >>>>>>>>>> Thanks for the feedback. > >>>>>>>>>> > >>>>>>>>>> DJ01/DJ02: > >>>>>>>>>> > >>>>>>>>>> MetadataResponse bumps from v13 to v14. The PartitionMetadata > >> struct > >>>>>>>>> gains a new > >>>>>>>>>> field PartitionAgeMs (int64, default -1), computed server-side > by > >> the > >>>>>>>>> broker as > >>>>>>>>>> broker_current_time - partition_creation_time. > >>>>>>>>>> > >>>>>>>>>> Also add the consumer heartbeat flow. when MembershipManager > >> detects > >>>>>>> a > >>>>>>>>> newly assigned > >>>>>>>>>> partition, it explicitly invalidates the metadata for the > affected > >>>>>>> topic > >>>>>>>>> and forces a fresh MetadataRequest > >>>>>>>>>> before making the offset reset decision, even if the topic ID is > >>>>>>> already > >>>>>>>>> in the cache. > >>>>>>>>>> > >>>>>>>>>> MB0: > >>>>>>>>>> > >>>>>>>>>> The consumer learns the broker's maximum supported > >> MetadataResponse > >>>>>>>>> version via the > >>>>>>>>>> ApiVersions negotiation at connection time. If the negotiated > >>>>>>> version is > >>>>>>>>> unsupported, the consumer > >>>>>>>>>> knows the broker does not support PartitionAgeMs at all and can > >>>>>>> throw an > >>>>>>>>> UnsupportedVersionException > >>>>>>>>>> immediately, rather than silently falling back to latest and > >> risking > >>>>>>>>> data loss without any operator-visible signal. > >>>>>>>>>> > >>>>>>>>>> MB1/MB2/MB3: > >>>>>>>>>> > >>>>>>>>>> I have addressed these changes in the KIP. > >>>>>>>>>> > >>>>>>>>>> Best Regards, > >>>>>>>>>> Jiunn-Yang > >>>>>>>>>> > >>>>>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年4月29日 下午4:04 寫道: > >>>>>>>>>>> > >>>>>>>>>>> hi David > >>>>>>>>>>> > >>>>>>>>>>> I agree with the direction of moving the 'age' resolution from > >> the > >>>>>>>>> Heartbeat API to the Metadata API to keep the control plane > clean. > >> The > >>>>>>> main > >>>>>>>>> trade-off, as we noted before, is introducing inter-broker clock > >> skew. > >>>>>>> The > >>>>>>>>> Group Coordinator approach provided a single source of truth for > >> time. > >>>>>>>>>>> > >>>>>>>>>>> However, realistically, this time skew should be negligible. > >> Given > >>>>>>> that > >>>>>>>>> the max.age threshold will likely be configured in minutes or > >> hours, a > >>>>>>>>> typical NTP skew (in milliseconds) between brokers won't impact > the > >>>>>>>>> fallback decision. > >>>>>>>>>>> > >>>>>>>>>>> Best, > >>>>>>>>>>> Chia-Ping > >>>>>>>>>>> > >>>>>>>>>>>> David Jacot via dev <[email protected]> 於 2026年4月29日 > 下午3:29 > >> 寫道: > >>>>>>>>>>>> > >>>>>>>>>>>> Hi all, > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks for the KIP! > >>>>>>>>>>>> > >>>>>>>>>>>> Sorry, I haven't really followed the previous conversation > but I > >>>>>>> took a > >>>>>>>>>>>> quick look at this one. > >>>>>>>>>>>> > >>>>>>>>>>>> DJ01: I don't clearly understand the flow with the > >>>>>>>>> ConsumerGroupHeartbeat > >>>>>>>>>>>> API after reading the KIP. There is a new boolean; the KIP > >> states > >>>>>>> that > >>>>>>>>>>>> partition ages are returned only when this boolean is set. > >>>>>>> Implicitly, > >>>>>>>>> this > >>>>>>>>>>>> means that when the consumer receives a new partition, it will > >>>>>>> issue a > >>>>>>>>> new > >>>>>>>>>>>> HB request with the boolean set to receive the ages. Is my > >>>>>>>>> understanding > >>>>>>>>>>>> correct? We should perhaps clarify the flow and also explain > >> how it > >>>>>>>>> fits > >>>>>>>>>>>> into the existing flow (e.g. list offsets, fetch offsets, > etc.). > >>>>>>>>>>>> DJ02: It my understanding is correct, I wonder if > >>>>>>>>>>>> the ConsumerGroupHeartbeat API is the right place for this > given > >>>>>>> that > >>>>>>>>> a new > >>>>>>>>>>>> round trip is done anyway. Alternatively, it could simply > >> include > >>>>>>> the > >>>>>>>>>>>> metadata. Generally, we should be rather cautious about not > >>>>>>> overloading > >>>>>>>>>>>> the ConsumerGroupHeartbeat API with unrelated concepts. The > API > >> is > >>>>>>> a > >>>>>>>>>>>> control plane API for assigning or revoking partitions. The > fact > >>>>>>> that > >>>>>>>>> we > >>>>>>>>>>>> don't want to add it to the corresponding Streams API also > >> suggests > >>>>>>>>>>>> something is not quite right. What would we do if we want to > >>>>>>> support > >>>>>>>>>>>> Streams in the future? > >>>>>>>>>>>> > >>>>>>>>>>>> Best, > >>>>>>>>>>>> David > >>>>>>>>>>>> > >>>>>>>>>>>>> On Wed, Apr 29, 2026 at 12:28 AM Muralidhar Basani via dev < > >>>>>>>>>>>>> [email protected]> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Hi Jiunn, > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thank you for this great kip. Good to know about the gap. > >>>>>>>>>>>>> > >>>>>>>>>>>>> mb-0 - why a new v2 version bump for RequestPartitionAges > >> field. > >>>>>>> Can a > >>>>>>>>>>>>> tagged field (for ex: on response, PartitionAges on > >>>>>>> TopicPartitions) > >>>>>>>>> be > >>>>>>>>>>>>> used here and avoid version bump? > >>>>>>>>>>>>> > >>>>>>>>>>>>> mb-1 - For the new config, is there a recommended value or a > >>>>>>> ConfigDef > >>>>>>>>>>>>> validator? Probably it should based on the > metadata.max.age.ms > >> ? > >>>>>>>>> Sizing > >>>>>>>>>>>>> instructions can be part of javadocs I guess. > >>>>>>>>>>>>> > >>>>>>>>>>>>> mb-2 - (minor) As there are no changes to Kafka Streams, > would > >> it > >>>>>>> be > >>>>>>>>> better > >>>>>>>>>>>>> to add this new config auto.offset.reset.latest.max.age to > the > >>>>>>>>>>>>> StreamsConfig block list > >>>>>>> (NON_CONFIGURABLE_CONSUMER_DEFAULT_CONFIGS) > >>>>>>>>> for a > >>>>>>>>>>>>> clear warning, incase users configure it? This is the most > >>>>>>> familiar > >>>>>>>>>>>>> consumer config and users might easily mistakenly configure > >> it. Or > >>>>>>>>> may be > >>>>>>>>>>>>> it's not worth it to add. > >>>>>>>>>>>>> > >>>>>>>>>>>>> mb-3 - (minor) The phrasing "the consumer falls back to > >> earliest" > >>>>>>>>> reads as > >>>>>>>>>>>>> if the config were being changed per-partition which isn't > >>>>>>> supported. > >>>>>>>>> May > >>>>>>>>>>>>> be rephrasing to something like "consumer resolves the > initial > >>>>>>>>> position to > >>>>>>>>>>>>> start offset for that partition" as if earliest was applied > to > >>>>>>> that > >>>>>>>>>>>>> partition only and auto.offset.reset config is unchanged. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>> Murali > >>>>>>>>>>>>> > >>>>>>>>>>>>>> On Tue, Apr 28, 2026 at 2:48 PM 黃竣陽 <[email protected]> > >> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi chia, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I have updated the KIP to include this change. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Best Regards, > >>>>>>>>>>>>>> Jiunn-Yang > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年4月28日 晚上8:03 > 寫道: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> hi Jiunn-Yang > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> chia_0: Should we expose the partition creation time via > the > >>>>>>> Admin > >>>>>>>>> API? > >>>>>>>>>>>>>> I assume it would be valuable for users to diagnose and > >>>>>>> troubleshoot > >>>>>>>>> the > >>>>>>>>>>>>>> behavior of auto.offset.reset.latest.max.age > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>> Chia-Ping > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On 2026/04/28 10:47:58 黃竣陽 wrote: > >>>>>>>>>>>>>>>> Hello everyone, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I would like to start a discussion on KIP-1327 Prevent Hot > >> Data > >>>>>>>>> Loss > >>>>>>>>>>>>> on > >>>>>>>>>>>>>> Partition Expansion for Latest Policy > >>>>>>>>>>>>>>>> < > >>>>>>>>>>>>> > >>>>>>>>> > >>>>>>> > >> > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/x/KY4mGQ__;!!Ayb5sqE7!qF4q1QzF1RRgP61D7A2xuEai1ky7fepKDKFFvpNBuePikH-ULmT87TvuuZzy5kau5E4y5zMZAmfQQiwZomM$ > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> This proposal aims to introduces > >>>>>>> auto.offset.reset.latest.max.age, > >>>>>>>>> a > >>>>>>>>>>>>>> consumer config that lets the > >>>>>>>>>>>>>>>> latest reset policy distinguish newly expanded (hot) > >> partitions > >>>>>>>>> from > >>>>>>>>>>>>>> long-existing (cold) ones. Partitions > >>>>>>>>>>>>>>>> younger than the configured threshold automatically fall > >> back > >>>>>>> to > >>>>>>>>>>>>>> earliest, preventing silent data loss > >>>>>>>>>>>>>>>> during topic expansion without forcing a full historical > >>>>>>> reprocess. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Best regards, > >>>>>>>>>>>>>>>> Jiunn-Yang > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>> > >>>> > >> > >> > >
