Hi Jun, Thanks for the feedback. I agree that shifting this policy toward a "Smarter Latest" (rather than a better Earliest) is a more elegant path.
The refined behavior would be: Out-of-range: Strictly follow latest semantics. This ensures a predictable "skip to end" behavior when users fall behind retention. No-offset (Initial Start & Expansion): Leverage Group Creation Time for lookup. • For new groups, this naturally results in latest behavior since creation time is "now". • For existing groups discovering new partitions, this results in earliest behavior for those specific partitions. Group GC: If a group is purged, it is treated as a brand-new group with a creation time of "now," consistently skipping to the end. WDYT? > Jun Rao via dev <[email protected]> 於 2026年4月23日 凌晨1:34 寫道: > > Hi, Chia-Ping, > > Thanks for the reply. > > Let's try to understand from the user's perspective. When the user starts > the group for the first time, it faces a choice on whether to process the > backlog or not. When the offset is out-of-range, the user faces the same > choice regarding backlog processing. It seems that most users want to make > the same choice regarding backlog processing. > > "Users who explicitly choose the to_start_time policy do so precisely > because they do not want to skip any records when encountering an > out-of-range scenario." > This argument is weak because that's how to_start_time is designed, but we > need to justify why it is a good choice in the first place. > > Jun > >> On Tue, Apr 21, 2026 at 12:35 PM Chia-Ping Tsai <[email protected]> wrote: >> >> Hi Jun, >> >> Thanks for the clarification. I think I misunderstood your previous point. >> Let me summarize the scenarios to ensure we are fully aligned. >> >> There are essentially three scenarios when a consumer needs to reset >> offsets: >> >> 1. >> >> Out-of-range (The group exists, but the offset has expired). >> 2. >> >> Extended partition (The group exists, but encounters a newly added >> partition with no committed offset). >> 3. >> >> No-offset (The group is completely new, or an existing group was >> deleted by the GC). >> >> We all agree that the primary goal of this KIP is to catch up on all >> records for scenario 2. There are no objections here. >> >> Regarding the inconsistency you pointed out between 1) and 3) under the >> current to_start_time design, I completely see your point. If users are >> not fully aware that to_start_time is designed to read all records since >> the creation of the group, they might get confused. >> >> However, to me, this "inconsistency" is actually a matter of >> predictability. Users who explicitly choose the to_start_time policy do >> so precisely because they do not want to skip any records when encountering >> an out-of-range scenario. >> >> (I would prefer to set aside the topic of group GC for a moment. It is >> much more important that we first focus our discussion on the >> "out-of-range" scenario) >> >> Best, >> >> Chia-Ping >> >> Jun Rao via dev <[email protected]> 於 2026年4月22日週三 上午1:13寫道: >> >>> Hi, Chia-Ping, >>> >>> Hmm, is that true? With the earliest policy, we treat an out-of-range >>> offset the same as no offset (because the group is deleted) and always set >>> it to the earliest offset, right? With to_start_time, an out-of-range >>> offset is treated differently from no offset. >>> >>> Thanks, >>> >>> Jun >>> >>> On Tue, Apr 21, 2026 at 12:54 AM Chia-Ping Tsai <[email protected]> >>> wrote: >>> >>>> hi Jun >>>> >>>> Nice point. Group GC is definitely an issue for to_start_time, but it is >>>> actually an issue for other policies as well. >>>> >>>> For example, a consumer using the earliest policy will suddenly read all >>>> historical records from scratch if it sleeps for a long while and gets >>>> GC'd; otherwise, it just resumes from previous offsets if the group >>> still >>>> exists. It is equally hard to explain to users: "Oh, your group was >>> GC'd, >>>> so your offset behavior changed." >>>> >>>> Therefore, it seems to me the right approach to fix this "inconsistency" >>>> is to offer a group-level GC timeout in a future KIP, allowing users to >>>> explicitly protect critical groups from GC. This saves not only >>>> to_start_time, but all other reset policies too. >>>> >>>> Best, >>>> Chia-Ping >>>> >>>> On 2026/04/20 20:19:47 Jun Rao via dev wrote: >>>>> Hi, Jiunn-Yang and Chia-Ping, >>>>> >>>>> Thanks for the reply. >>>>> >>>>> The main concern I see with to_start_time is that its behavoir on how >>>> much >>>>> data to consume when the offset is out of range is not consistent and >>> is >>>>> hard to explain. If the group still exists, it will read from the >>>> earliest >>>>> offset. Otherwise, it will read from the latest. >>>>> >>>>> Jun >>>>> >>>>> On Mon, Apr 20, 2026 at 10:13 AM Chia-Ping Tsai <[email protected]> >>>> wrote: >>>>> >>>>>> hi all, >>>>>> >>>>>> Just a note for a potential latest_v2: >>>>>> >>>>>> Since the purpose is to read all records from extended partitions, >>> we >>>>>> could leverage the group creation time to compare against the >>> earliest >>>>>> record of a partition when there is no committed offset. If the >>> group >>>>>> creation time is larger than the earliest record's timestamp, we >>>> assume it >>>>>> is not an extended partition. Otherwise, we treat it as an extended >>>>>> partition. >>>>>> >>>>>> This approach allows us to catch all "possible" extended partitions, >>>> which >>>>>> includes both "true" extended partitions and old but truncated >>>> partitions. >>>>>> While there is a rare edge case where the cost is reprocessing some >>>> records >>>>>> we don't necessarily want, it is very easy to implement and >>> guarantees >>>> we >>>>>> will never miss the actual extended partitions. >>>>>> >>>>>> Best, >>>>>> Chia-Ping >>>>>> >>>>>> On 2026/04/20 13:33:31 黃竣陽 wrote: >>>>>>> Hello all, >>>>>>> >>>>>>> I have added a new "Future Work: latest_strict Policy" section to >>> the >>>>>> KIP. >>>>>>> The idea is a future policy that uses latest semantics by default >>> but >>>>>> falls >>>>>>> back to the group creation timestamp specifically for newly added >>>>>> partitions >>>>>>> during partition expansion. This would reuse the group creation >>> time >>>>>> anchor >>>>>>> introduced by this KIP, making it a natural extension with minimal >>>>>> additional >>>>>>> protocol changes. >>>>>>> >>>>>>> Best Regards, >>>>>>> Jiunn-Yang >>>>>>> >>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年4月18日 下午4:09 寫道: >>>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> It is practically NP-hard to guess everyone's ideal use case >>> right >>>> now. >>>>>>>> Also, I believe we all want to avoid falling back to the >>> intricate >>>>>>>> multi-policy approach proposed in KIP-842. >>>>>>>> >>>>>>>> I prefer to keep this KIP focused and discuss a "v2 latest" >>> policy >>>> in a >>>>>>>> separate KIP. That future policy could build upon the >>> to_start_time >>>>>> anchor >>>>>>>> to fix data loss specifically for extended partitions. We could >>>> call it >>>>>>>> something like latest_strict. >>>>>>>> >>>>>>>> Thoughts? >>>>>>>> >>>>>>>> >>>>>>>> 黃竣陽 <[email protected]> 於 2026年4月18日週六 下午3:24寫道: >>>>>>>> >>>>>>>>> Hello Jun, >>>>>>>>> >>>>>>>>> Thanks for the reply, >>>>>>>>> >>>>>>>>> When the offset goes out of range, the user faces two options: >>>>>>>>> >>>>>>>>> 1. Skip to the end (latest behavior) — risk losing data that >>> was >>>>>> produced >>>>>>>>> during >>>>>>>>> the group's lifetime but not yet consumed. >>>>>>>>> 2. Seek back to the group creation time (to_start_time >>> behavior) — >>>>>>>>> potentially >>>>>>>>> reprocess some data, but guarantee no data from the group's >>>> lifetime >>>>>> is >>>>>>>>> silently lost. >>>>>>>>> >>>>>>>>> to_start_time chooses option 2 because its core promise is >>> "never >>>>>> silently >>>>>>>>> lose data >>>>>>>>> produced after the group started." If we fell back to latest on >>>>>>>>> out-of-range, we would >>>>>>>>> break this guarantee. >>>>>>>>> >>>>>>>>> I consider users who prefer option 1 can simply use >>>>>>>>> auto.offset.reset=latest. >>>>>>>>> >>>>>>>>> Best Regards, >>>>>>>>> Jiunn-Yang >>>>>>>>> >>>>>>>>>> Jun Rao via dev <[email protected]> 於 2026年4月18日 凌晨1:57 >>> 寫道: >>>>>>>>>> >>>>>>>>>> Hi, Jiunn-Yang and Chia-Ping, >>>>>>>>>> >>>>>>>>>> Thanks for the reply. >>>>>>>>>> >>>>>>>>>> "The core semantic of to_start_time is to read all records >>> since >>>> the >>>>>>>>>> creation of the group." >>>>>>>>>> >>>>>>>>>> I am just questioning whether this actually covers a common >>> use >>>>>> case. If >>>>>>>>>> the offset doesn't go out of range, the logic makes sense to >>> me. >>>> I'm >>>>>> not >>>>>>>>>> sure about the logic if the offset is out of range. If a user >>>>>> chooses to >>>>>>>>>> skip the historical data when starting the group, it seems the >>>> user >>>>>>>>> likely >>>>>>>>>> wants to do the same if the offset is out of range. >>>>>>>>>> >>>>>>>>>> Jun >>>>>>>>>> >>>>>>>>>> On Fri, Apr 17, 2026 at 5:23 AM 黃竣陽 <[email protected]> >>> wrote: >>>>>>>>>> >>>>>>>>>>> Hello Jun, >>>>>>>>>>> >>>>>>>>>>> Thank for the feedback, >>>>>>>>>>> >>>>>>>>>>> Adding to the points above: >>>>>>>>>>> >>>>>>>>>>> Regarding by_duration as an alternative to Scenario 1: beyond >>>> clock >>>>>> skew >>>>>>>>>>> and retry issues, there is also a usability concern. >>> by_duration >>>>>>>>> requires >>>>>>>>>>> users >>>>>>>>>>> to reason about operational timing — "how long does partition >>>>>> discovery >>>>>>>>>>> take >>>>>>>>>>> in my environment?”, and then translate that into a >>>> configuration >>>>>> value. >>>>>>>>>>> to_start_time >>>>>>>>>>> requires no such reasoning. It simply anchors to the group >>>> creation >>>>>> time >>>>>>>>>>> recorded >>>>>>>>>>> by the broker. >>>>>>>>>>> >>>>>>>>>>> Regarding Scenario 2: I'd also like to clarify that >>>> to_start_time >>>>>> does >>>>>>>>> not >>>>>>>>>>> branch between >>>>>>>>>>> "use latest" and "use earliest." It applies the same >>>>>> ListOffsetsRequest >>>>>>>>>>> with the group creation >>>>>>>>>>> timestamp in all cases. The difference in outcome: >>>>>>>>>>> - skipping old data on first start >>>>>>>>>>> - consuming surviving data after truncation >>>>>>>>>>> is a natural consequence of what data exists in the >>> partition at >>>>>> that >>>>>>>>>>> point, not a different policy >>>>>>>>>>> being applied. The rule is always the same. >>>>>>>>>>> >>>>>>>>>>> Best Regards, >>>>>>>>>>> Jiunn-Yang >>>>>>>>>>> >>>>>>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年4月17日 上午9:48 寫道: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Jun Rao via dev <[email protected]> 於 2026年4月17日 凌晨4:57 >>>> 寫道: >>>>>>>>>>>>> >>>>>>>>>>>>> Also, a group is deleted after the consumer has been idle >>>> longer >>>>>>>>>>>>> than offsets.retention.minutes. What's the semantic of >>>>>> to_start_time >>>>>>>>> if >>>>>>>>>>> the >>>>>>>>>>>>> group creation time is unavailable? >>>>>>>>>>>> >>>>>>>>>>>> If the group is recreated, a new creation time will be >>>> recorded. >>>>>> Hence, >>>>>>>>>>> it acts like a new group. Plus, it throws an exception >>> directly >>>> if >>>>>> the >>>>>>>>>>> group truly has no creation time. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>
