> > LM1: About group.initial.rebalance.delay.ms, I expect the interaction > with the interval is just as described for the streams initial delay and > interval, correct? Should we clarify that in the KIP (it only mentions the > streams case)
We haven't added a consumer or share group initial.rebalance.delay.ms config yet. It only exists for streams right now. LM2: The KIP refers to batching assignment re-calculations triggered by > member subscriptions changes, but I expect the batching mechanism applies > the same when the assignment re-calculation is triggered by metadata > changes (i.e topic/partition created or deleted), without any HB changing > subscriptions. Is my understanding correct? Yes, that's right. Topic metadata changes also bump the group epoch and triggers the same assignment flow. LM3: About this section: "When there is an in-flight assignor run for the > group, there is no new target assignment. We will trigger the next assignor > run on a future heartbeat.". I expect that the next assignor run will be > triggered on the next HB from this or from any other member of the group, > received after the interval expires (without the members re-sending the > subscription change). Is my expectation correct? If so, it may be worth > clarifying in the KIP to avoid confusion with client-side implementations. I tried to clarify in the KIP. Let me know your thoughts! On Thu, Jan 22, 2026 at 10:56 AM Sean Quah <[email protected]> wrote: > dl01: Could we mention the handling when the group metadata or >> topic partition metadata is changed or deleted during the async assignor >> run? > > Thanks! I've added a paragraph to the Assignment Offload section > describing the handling of group metadata changes. Topic metadata changes > already bump the group epoch and we don't need to handle them specially. > > dl02: This might be a question for the overall coordinator executor - do >> we have plans to apply an explicit size limit to the executor queue? If >> many groups trigger offloaded assignments simultaneously, should we apply >> some backpressure for protection? > > There aren't any plans for that right now. We actually don't have a size > limit for the event processor queue either. > > On Thu, Jan 22, 2026 at 10:56 AM Sean Quah <[email protected]> wrote: > >> Hi all, thanks for the feedback so far. >> >> dj01: In the proposed changes section, you state that the timestamp of >>> the last assignment is not persisted. How do you plan to bookkeep it if it >>> is not stored with the assignment? Intuitively, I would add a timestamp to >>> the assignment record. >> >> Thinking about it, it's easier to add it to the assignment record. I will >> update the KIP. One thing to note is that the timestamp will be subject to >> rollbacks when writing to the log fails, so we can allow extra assignment >> runs when that happens. >> >> dj02: I wonder whether we should also add a "thread idle ratio" metric >>> for the group coordinator executor. What do you think? >> >> I think it could be useful so I've added it to the KIP. The >> implementation will have to be different to the event processor, since we >> currently use an ExecutorService. >> >> dj03: If the executor is not used by the share coordinator, it should not >>> expose any metrics about it. Is it possible to remove them? >> >> I've removed them from the KIP. We can add a parameter to the coordinator >> metrics class to control whether they are visible. >> >> dj04: Is having one group coordinator executor thread sufficient by >>> default for common workloads? >> >> Yes and no. I expect it will be very difficult to overload an entire >> thread, ie. submit work faster than it can complete it. But updating the >> default to two threads could be good for reducing delays due to >> simultaneous assignor runs. I've raised the default to 2 threads. >> >> dj05: It seems you propose enabling the minimum assignor interval with a >>> default of 5 seconds. However, the offloading is not enabled by default. Is >>> the first one enough to guarantee the stability of the group coordinator? >>> How do you foresee enabling the second one in the future? It would be great >>> if you could address this in the KIP. We need a clear motivation for >>> changing the default behavior and a plan for the future. >> >> I initially thought that offloading would increase rebalance times by 1 >> heartbeat and so didn't propose turning it on by default. But after some >> more thinking, I believe both features will increase rebalance times by 1 >> heartbeat interval and the increase shouldn't stack. The minimum assignor >> interval only impacts groups with more than 2 members, while offloading >> only impacts groups with a single member. This is because in the other >> cases, the extra delays are folded into existing revocation + heartbeat >> delays. Note that share groups have no revocation so always see increased >> rebalance times. I've updated the KIP to add the analysis of rebalance >> times and propose turning both features on by default. >> >> dj06: Based on its description, I wonder whether ` >>> consumer.min.assignor.interval.ms` should be called ` >>> consumer.min.assignment.interval.ms`. What do you think? >> >> Thanks, I've renamed the config options in the KIP. What about the >> assignor.offload.enable configs? >> >> dj07: It is not possible to enable/disable the offloading at the group >>> level. This makes sense to me but it would be great to explain the >>> rationale for it in the KIP. >> >> Thinking about it, there's nothing stopping us from configuring >> offloading at the group level. In fact it might be desirable for some users >> to disable offloading at the group coordinator level to keep rebalances >> fast and only enable it for problematic large groups. I've added a >> group-level override to the KIP. >> >> On Tue, Jan 20, 2026 at 1:38 PM Lianet Magrans <[email protected]> >> wrote: >> >>> Hi Sean, thanks for the KIP. >>> >>> LM1: About group.initial.rebalance.delay.ms, I expect the interaction >>> with the interval is just as described for the streams initial delay and >>> interval, correct? Should we clarify that in the KIP (it only mentions the >>> streams case) >>> >>> LM2: The KIP refers to batching assignment re-calculations triggered by >>> member subscriptions changes, but I expect the batching mechanism applies >>> the same when the assignment re-calculation is triggered by metadata >>> changes (i.e topic/partition created or deleted), without any HB changing >>> subscriptions. Is my understanding correct? >>> >>> LM3: About this section: "*When there is an in-flight assignor run for >>> the group, there is no new target assignment. We will trigger the next >>> assignor run on a future heartbeat.*". I expect that the next assignor >>> run will be triggered on the next HB from this or from any other member of >>> the group, received after the interval expires (without the members >>> re-sending the subscription change). Is my expectation correct? If so, >>> it may be worth clarifying in the KIP to avoid confusion with client-side >>> implementations. >>> >>> Thanks! >>> Lianet >>> >>> >>> >>> On Tue, Jan 13, 2026 at 1:23 AM Sean Quah via dev <[email protected]> >>> wrote: >>> >>>> sq01: We also have to update the SyncGroup request handling to only >>>> return >>>> REBALANCE_IN_PROGRESS when the member's epoch is behind the target >>>> assignment epoch, not the group epoch. Thanks to Dongnuo for pointing >>>> this >>>> out. >>>> >>>> On Thu, Jan 8, 2026 at 5:40 PM Dongnuo Lyu via dev < >>>> [email protected]> >>>> wrote: >>>> >>>> > Hi Sean, thanks for the KIP! I have a few questions as follows. >>>> > >>>> > dl01: Could we mention the handling when the group metadata or topic >>>> > partition metadata is changed or deleted during the async assignor >>>> run? >>>> > >>>> > dl02: This might be a question for the overall coordinator executor - >>>> do we >>>> > have plans to apply an explicit size limit to the executor queue? If >>>> many >>>> > groups trigger offloaded assignments simultaneously, should we apply >>>> some >>>> > backpressure for protection? >>>> > >>>> > Also resonate with dj05, for small groups default ` >>>> > min.assignor.interval.ms` >>>> > to 5s might not be necessary, so not sure if we want to make the batch >>>> > assignment default. Or it might be good to have a per group >>>> enablement. >>>> > >>>> > Thanks >>>> > Dongnuo >>>> > >>>> >>>
