Hello chia, Thanks for the comments, I have updated the KIP!
Best Regards, Jiunn-Yang > Chia-Ping Tsai <[email protected]> 於 2026年5月30日 晚上8:29 寫道: > > Hi Jiunn-Yang, > > Would you mind removing the terms "hot" and "cold" when describing > partitions in the KIP? I understand you are using them to describe the > "freshness" or the users' need for the records, but applying these terms to > the partition itself feels a bit unnatural. > > After all, in this scenario, users don't really care whether a partition is > newly expanded or not. Their only expectation is that they won't silently > lose any live records produced to the topic during their active consumption. > > Best, Chia-Ping > > > > 黃竣陽 <[email protected]> 於 2026年5月30日週六 下午12:30寫道: > >> Hello Jun, >> >> Thanks for the feedback, I have updated the KIP motivation section. >> >> Best Regards, >> Jiunn-Yang >> >>> Jun Rao via dev <[email protected]> 於 2026年5月30日 凌晨1:12 寫道: >>> >>> Hi, Jiunn-Yang, >>> >>> Thanks for the reply. I think we need a stronger motivation for the KIP. >>> >>> The KIP says "The core insight is that not all partitions without a >>> committed offset are the same. A newly expanded partition (hot) is >>> fundamentally different from a partition the consumer has never seen >>> because it predates the group (cold)." Why is the hot partition >>> fundamentally different from the cold? >>> >>> The KIP says "The existing by_duration policy is also insufficient >> because: >>> >>> - The calculated seek time (now() - duration) varies across nodes due >> to >>> clock skew. To be safe, users must set an overly large duration, >> causing >>> unnecessary reprocessing. >>> - On network errors, the client recalculates the seek time on retry, >>> shifting the target timestamp forward and risking data loss." >>> >>> However, both of these situations are rare. If these issues persist, more >>> severe problems likely exist elsewhere. Rare situations don't need a >> common >>> solution. If users care about those rare situations, they can implement >>> customized logic using ConsumerRebalanceListener.onPartitionsAssigned(). >>> >>> Jun >>> >>> >>> On Sun, May 17, 2026 at 6:50 AM 黃竣陽 <[email protected]> wrote: >>> >>>> Hello chia, >>>> >>>> Thanks for the feedback, >>>> >>>>> If the creation time exists, the returned value should always be >> greater >>>> than or equal to zero, right? >>>> I have explicitly mentioned this in the KIP. >>>> >>>>>> New Old (MetadataResponse v0–13) positive any field >>>> absent UnsupportedVersionException >>>> >>>> The earliest point at which we can detect the version mismatch is during >>>> the >>>> first metadata fetch after assignment, which occurs inside poll(). >>>> Therefore, the >>>> user would encounter an UnsupportedVersionException from poll(). I’ll >>>> clarify this in the KIP. >>>> >>>> Best Regards, >>>> Jiunn-Yang >>>> >>>>> Chia-Ping Tsai <[email protected]> 於 2026年5月17日 下午4:50 寫道: >>>>> >>>>> hi Jiunn >>>>> >>>>>> PartitionAgeMs (int64, default -1): The age of this partition in >>>> milliseconds, computed server-side by the broker as broker_current_time >> - >>>> partition_creation_time. Returns -1 if the broker does not support this >>>> feature or the partition creation time is unknown. >>>>> >>>>> If the creation time exists, the returned value should always be >> greater >>>> than or equal to zero, right? >>>>> >>>>>> New Old (MetadataResponse v0–13) positive any field >>>> absent UnsupportedVersionException >>>>> >>>>> Will user encounter UnsupportedVersionException when calling `poll()`? >>>>> >>>>> Best, >>>>> Chia-Ping >>>>> >>>>> >>>>> On 2026/05/16 04:30:49 黃竣陽 wrote: >>>>>> Hello Jun, chia, >>>>>> >>>>>> I've updated KIP-1327 with a design change based on the discussion >>>>>> feedback. >>>>>> >>>>>> The updated design decouples the new-partition reset behavior from >>>>>> the base auto.offset.reset policy: >>>>>> >>>>>> - auto.offset.reset.max.age.ms now applies to all auto.offset.reset >>>> values >>>>>> (latest, earliest, by_duration, none). >>>>>> - For new ("hot") partitions, the consumer resets to >>>> auto.offset.reset.new.partitions >>>>>> config setting >>>>>> - For existing ("cold") partitions, the base auto.offset.reset policy >>>> continues >>>>>> to apply unchanged. >>>>>> - The new-partition reset behavior is represented by a separate >>>> internal config >>>>>> (auto.offset.reset.new.partitions, currently fixed to earliest). This >>>> decoupled design makes >>>>>> it straightforward to promote the behavior to a public user-facing >>>> configuration in a future KIP. >>>>>> >>>>>> Best Regards, >>>>>> Jiunn-Yang >>>>>> >>>>>> >>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年5月16日 清晨7:46 寫道: >>>>>>> >>>>>>> hi Jun >>>>>>> >>>>>>> I see what you mean now. The proposal from me is listed below: >>>>>>> >>>>>>> 1) Add auto.offset.reset.new.partitions with a default value of >>>> earliest. It fixes the data loss from both by_duration and latest, and >> it >>>> does not change the logic of auto.offset.reset=earliest. >>>>>>> 2) Mark auto.offset.reset.new.partitions as an internal >>>> configuration. auto.offset.reset.new.partitions=earliest already >>>> addresses the issue, and we can discuss the use cases of other values >> in a >>>> separate KIP. >>>>>>> 3) Both configs, auto.offset.reset.new.partitions and >>>> auto.offset.reset.latest.max.age.ms, will be applied to all for >>>> consistency. >>>>>>> >>>>>>> WDYT? >>>>>>> >>>>>>> On 2026/05/15 20:53:20 Jun Rao via dev wrote: >>>>>>>> Hi, Chia-Ping, >>>>>>>> >>>>>>>> Thanks for the reply. >>>>>>>> >>>>>>>> 1. In the motivation section, the KIP says "When a Kafka topic is >>>> expanded >>>>>>>> with new partitions, consumers using the latest auto offset reset >>>> policy >>>>>>>> will silently miss all records produced to those partitions before >> the >>>>>>>> consumer discovers them.". If a user sets >>>>>>>> auto.offset.reset=by_duration=1sec, the same record loss issue could >>>> also >>>>>>>> happen, right? >>>>>>>> >>>>>>>> 2. I was thinking auto.offset.reset.new.partitions will take the >> same >>>>>>>> values as auto.offset.reset. So a user could set it by_duration if >>>> needed. >>>>>>>> >>>>>>>> Jun >>>>>>>> >>>>>>>> On Thu, May 14, 2026 at 4:06 PM Chia-Ping Tsai <[email protected] >>> >>>> wrote: >>>>>>>> >>>>>>>>> hi Jun >>>>>>>>> >>>>>>>>> Thanks for the feedback. I might be missing something important >> from >>>> your >>>>>>>>> suggestion, so please bear with me as I try to clarify with a few >>>> questions: >>>>>>>>> >>>>>>>>> 1. Is there a strong use case for extending this logic to other >> reset >>>>>>>>> policies? Unlike latest, policies like earliest or by_duration >> don't >>>> seem >>>>>>>>> to suffer from the same silent data loss issue when a partition is >>>> expanded. >>>>>>>>> >>>>>>>>> 2. What values would we expect users to configure for >>>>>>>>> auto.offset.reset.new.partitions? If they set it to earliest or >>>> latest, >>>>>>>>> we might run into the exact same edge cases. For example, if a >>>> consumer is >>>>>>>>> offline for a while and a new partition is created during that >>>> downtime, >>>>>>>>> the user might actually want to skip to latest when resuming, >> rather >>>> than >>>>>>>>> reading from earliest just because the partition is technically >>>> "new" to >>>>>>>>> the group. >>>>>>>>> >>>>>>>>> This is exactly why we opted for introducing a max.age threshold. >> It >>>> gives >>>>>>>>> users a time-bound way to define what is genuinely "hot/new" and >>>> what is >>>>>>>>> just an old partition they haven't seen yet. >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Chia-Ping >>>>>>>>> >>>>>>>>> On 2026/05/14 20:48:09 Jun Rao via dev wrote: >>>>>>>>>> Hi, Jiunn-Yang, >>>>>>>>>> >>>>>>>>>> Thanks for the KIP. >>>>>>>>>> >>>>>>>>>> I find auto.offset.reset.latest.max.age a bit weird. It only >>>> applies when >>>>>>>>>> auto.offset.reset is latest. However, it seems that the motivation >>>>>>>>> equally >>>>>>>>>> applies when auto.offset.reset is set to other values like >>>> by_duration. >>>>>>>>> The >>>>>>>>>> intention is that we want to have a separate way to control newly >>>> created >>>>>>>>>> partitions vs existing partitions when the group starts. Have we >>>>>>>>> considered >>>>>>>>>> adding a new config like auto.offset.reset.new.partitions? If >> this >>>> new >>>>>>>>>> config is not set, the offset reset policy defaults to the policy >>>> used >>>>>>>>> for >>>>>>>>>> existing partitions. The user could set it explicitly to customize >>>> the >>>>>>>>>> behavior for new partitions. >>>>>>>>>> >>>>>>>>>> Jun >>>>>>>>>> >>>>>>>>>> On Thu, May 7, 2026 at 5:07 AM 黃竣陽 <[email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> I’d like to manually bump this thread. >>>>>>>>>>> >>>>>>>>>>> Best Regards, >>>>>>>>>>> Jiunn-Yang >>>>>>>>>>> >>>>>>>>>>>> 黃竣陽 <[email protected]> 於 2026年5月1日 晚上10:37 寫道: >>>>>>>>>>>> >>>>>>>>>>>> Hello all, >>>>>>>>>>>> >>>>>>>>>>>> Thanks for the feedback. >>>>>>>>>>>> >>>>>>>>>>>> DJ01/DJ02: >>>>>>>>>>>> >>>>>>>>>>>> MetadataResponse bumps from v13 to v14. The PartitionMetadata >>>> struct >>>>>>>>>>> gains a new >>>>>>>>>>>> field PartitionAgeMs (int64, default -1), computed server-side >> by >>>> the >>>>>>>>>>> broker as >>>>>>>>>>>> broker_current_time - partition_creation_time. >>>>>>>>>>>> >>>>>>>>>>>> Also add the consumer heartbeat flow. when MembershipManager >>>> detects >>>>>>>>> a >>>>>>>>>>> newly assigned >>>>>>>>>>>> partition, it explicitly invalidates the metadata for the >> affected >>>>>>>>> topic >>>>>>>>>>> and forces a fresh MetadataRequest >>>>>>>>>>>> before making the offset reset decision, even if the topic ID is >>>>>>>>> already >>>>>>>>>>> in the cache. >>>>>>>>>>>> >>>>>>>>>>>> MB0: >>>>>>>>>>>> >>>>>>>>>>>> The consumer learns the broker's maximum supported >>>> MetadataResponse >>>>>>>>>>> version via the >>>>>>>>>>>> ApiVersions negotiation at connection time. If the negotiated >>>>>>>>> version is >>>>>>>>>>> unsupported, the consumer >>>>>>>>>>>> knows the broker does not support PartitionAgeMs at all and can >>>>>>>>> throw an >>>>>>>>>>> UnsupportedVersionException >>>>>>>>>>>> immediately, rather than silently falling back to latest and >>>> risking >>>>>>>>>>> data loss without any operator-visible signal. >>>>>>>>>>>> >>>>>>>>>>>> MB1/MB2/MB3: >>>>>>>>>>>> >>>>>>>>>>>> I have addressed these changes in the KIP. >>>>>>>>>>>> >>>>>>>>>>>> Best Regards, >>>>>>>>>>>> Jiunn-Yang >>>>>>>>>>>> >>>>>>>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年4月29日 下午4:04 寫道: >>>>>>>>>>>>> >>>>>>>>>>>>> hi David >>>>>>>>>>>>> >>>>>>>>>>>>> I agree with the direction of moving the 'age' resolution from >>>> the >>>>>>>>>>> Heartbeat API to the Metadata API to keep the control plane >> clean. >>>> The >>>>>>>>> main >>>>>>>>>>> trade-off, as we noted before, is introducing inter-broker clock >>>> skew. >>>>>>>>> The >>>>>>>>>>> Group Coordinator approach provided a single source of truth for >>>> time. >>>>>>>>>>>>> >>>>>>>>>>>>> However, realistically, this time skew should be negligible. >>>> Given >>>>>>>>> that >>>>>>>>>>> the max.age threshold will likely be configured in minutes or >>>> hours, a >>>>>>>>>>> typical NTP skew (in milliseconds) between brokers won't impact >> the >>>>>>>>>>> fallback decision. >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Chia-Ping >>>>>>>>>>>>> >>>>>>>>>>>>>> David Jacot via dev <[email protected]> 於 2026年4月29日 >> 下午3:29 >>>> 寫道: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for the KIP! >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sorry, I haven't really followed the previous conversation >> but I >>>>>>>>> took a >>>>>>>>>>>>>> quick look at this one. >>>>>>>>>>>>>> >>>>>>>>>>>>>> DJ01: I don't clearly understand the flow with the >>>>>>>>>>> ConsumerGroupHeartbeat >>>>>>>>>>>>>> API after reading the KIP. There is a new boolean; the KIP >>>> states >>>>>>>>> that >>>>>>>>>>>>>> partition ages are returned only when this boolean is set. >>>>>>>>> Implicitly, >>>>>>>>>>> this >>>>>>>>>>>>>> means that when the consumer receives a new partition, it will >>>>>>>>> issue a >>>>>>>>>>> new >>>>>>>>>>>>>> HB request with the boolean set to receive the ages. Is my >>>>>>>>>>> understanding >>>>>>>>>>>>>> correct? We should perhaps clarify the flow and also explain >>>> how it >>>>>>>>>>> fits >>>>>>>>>>>>>> into the existing flow (e.g. list offsets, fetch offsets, >> etc.). >>>>>>>>>>>>>> DJ02: It my understanding is correct, I wonder if >>>>>>>>>>>>>> the ConsumerGroupHeartbeat API is the right place for this >> given >>>>>>>>> that >>>>>>>>>>> a new >>>>>>>>>>>>>> round trip is done anyway. Alternatively, it could simply >>>> include >>>>>>>>> the >>>>>>>>>>>>>> metadata. Generally, we should be rather cautious about not >>>>>>>>> overloading >>>>>>>>>>>>>> the ConsumerGroupHeartbeat API with unrelated concepts. The >> API >>>> is >>>>>>>>> a >>>>>>>>>>>>>> control plane API for assigning or revoking partitions. The >> fact >>>>>>>>> that >>>>>>>>>>> we >>>>>>>>>>>>>> don't want to add it to the corresponding Streams API also >>>> suggests >>>>>>>>>>>>>> something is not quite right. What would we do if we want to >>>>>>>>> support >>>>>>>>>>>>>> Streams in the future? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best, >>>>>>>>>>>>>> David >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Apr 29, 2026 at 12:28 AM Muralidhar Basani via dev < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Jiunn, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thank you for this great kip. Good to know about the gap. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> mb-0 - why a new v2 version bump for RequestPartitionAges >>>> field. >>>>>>>>> Can a >>>>>>>>>>>>>>> tagged field (for ex: on response, PartitionAges on >>>>>>>>> TopicPartitions) >>>>>>>>>>> be >>>>>>>>>>>>>>> used here and avoid version bump? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> mb-1 - For the new config, is there a recommended value or a >>>>>>>>> ConfigDef >>>>>>>>>>>>>>> validator? Probably it should based on the >> metadata.max.age.ms >>>> ? >>>>>>>>>>> Sizing >>>>>>>>>>>>>>> instructions can be part of javadocs I guess. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> mb-2 - (minor) As there are no changes to Kafka Streams, >> would >>>> it >>>>>>>>> be >>>>>>>>>>> better >>>>>>>>>>>>>>> to add this new config auto.offset.reset.latest.max.age to >> the >>>>>>>>>>>>>>> StreamsConfig block list >>>>>>>>> (NON_CONFIGURABLE_CONSUMER_DEFAULT_CONFIGS) >>>>>>>>>>> for a >>>>>>>>>>>>>>> clear warning, incase users configure it? This is the most >>>>>>>>> familiar >>>>>>>>>>>>>>> consumer config and users might easily mistakenly configure >>>> it. Or >>>>>>>>>>> may be >>>>>>>>>>>>>>> it's not worth it to add. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> mb-3 - (minor) The phrasing "the consumer falls back to >>>> earliest" >>>>>>>>>>> reads as >>>>>>>>>>>>>>> if the config were being changed per-partition which isn't >>>>>>>>> supported. >>>>>>>>>>> May >>>>>>>>>>>>>>> be rephrasing to something like "consumer resolves the >> initial >>>>>>>>>>> position to >>>>>>>>>>>>>>> start offset for that partition" as if earliest was applied >> to >>>>>>>>> that >>>>>>>>>>>>>>> partition only and auto.offset.reset config is unchanged. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Murali >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Apr 28, 2026 at 2:48 PM 黃竣陽 <[email protected]> >>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi chia, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I have updated the KIP to include this change. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>>>> Jiunn-Yang >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年4月28日 晚上8:03 >> 寫道: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> hi Jiunn-Yang >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> chia_0: Should we expose the partition creation time via >> the >>>>>>>>> Admin >>>>>>>>>>> API? >>>>>>>>>>>>>>>> I assume it would be valuable for users to diagnose and >>>>>>>>> troubleshoot >>>>>>>>>>> the >>>>>>>>>>>>>>>> behavior of auto.offset.reset.latest.max.age >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>> Chia-Ping >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 2026/04/28 10:47:58 黃竣陽 wrote: >>>>>>>>>>>>>>>>>> Hello everyone, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I would like to start a discussion on KIP-1327 Prevent Hot >>>> Data >>>>>>>>>>> Loss >>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>> Partition Expansion for Latest Policy >>>>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>> >> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/x/KY4mGQ__;!!Ayb5sqE7!qF4q1QzF1RRgP61D7A2xuEai1ky7fepKDKFFvpNBuePikH-ULmT87TvuuZzy5kau5E4y5zMZAmfQQiwZomM$ >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> This proposal aims to introduces >>>>>>>>> auto.offset.reset.latest.max.age, >>>>>>>>>>> a >>>>>>>>>>>>>>>> consumer config that lets the >>>>>>>>>>>>>>>>>> latest reset policy distinguish newly expanded (hot) >>>> partitions >>>>>>>>>>> from >>>>>>>>>>>>>>>> long-existing (cold) ones. Partitions >>>>>>>>>>>>>>>>>> younger than the configured threshold automatically fall >>>> back >>>>>>>>> to >>>>>>>>>>>>>>>> earliest, preventing silent data loss >>>>>>>>>>>>>>>>>> during topic expansion without forcing a full historical >>>>>>>>> reprocess. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>> Jiunn-Yang >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>> >>>> >> >>
