> > dl01: Could we mention the handling when the group metadata or > topic partition metadata is changed or deleted during the async assignor > run?
Thanks! I've added a paragraph to the Assignment Offload section describing the handling of group metadata changes. Topic metadata changes already bump the group epoch and we don't need to handle them specially. dl02: This might be a question for the overall coordinator executor - do we > have plans to apply an explicit size limit to the executor queue? If many > groups trigger offloaded assignments simultaneously, should we apply some > backpressure for protection? There aren't any plans for that right now. We actually don't have a size limit for the event processor queue either. On Thu, Jan 22, 2026 at 10:56 AM Sean Quah <[email protected]> wrote: > Hi all, thanks for the feedback so far. > > dj01: In the proposed changes section, you state that the timestamp of the >> last assignment is not persisted. How do you plan to bookkeep it if it is >> not stored with the assignment? Intuitively, I would add a timestamp to the >> assignment record. > > Thinking about it, it's easier to add it to the assignment record. I will > update the KIP. One thing to note is that the timestamp will be subject to > rollbacks when writing to the log fails, so we can allow extra assignment > runs when that happens. > > dj02: I wonder whether we should also add a "thread idle ratio" metric for >> the group coordinator executor. What do you think? > > I think it could be useful so I've added it to the KIP. The implementation > will have to be different to the event processor, since we currently use an > ExecutorService. > > dj03: If the executor is not used by the share coordinator, it should not >> expose any metrics about it. Is it possible to remove them? > > I've removed them from the KIP. We can add a parameter to the coordinator > metrics class to control whether they are visible. > > dj04: Is having one group coordinator executor thread sufficient by >> default for common workloads? > > Yes and no. I expect it will be very difficult to overload an entire > thread, ie. submit work faster than it can complete it. But updating the > default to two threads could be good for reducing delays due to > simultaneous assignor runs. I've raised the default to 2 threads. > > dj05: It seems you propose enabling the minimum assignor interval with a >> default of 5 seconds. However, the offloading is not enabled by default. Is >> the first one enough to guarantee the stability of the group coordinator? >> How do you foresee enabling the second one in the future? It would be great >> if you could address this in the KIP. We need a clear motivation for >> changing the default behavior and a plan for the future. > > I initially thought that offloading would increase rebalance times by 1 > heartbeat and so didn't propose turning it on by default. But after some > more thinking, I believe both features will increase rebalance times by 1 > heartbeat interval and the increase shouldn't stack. The minimum assignor > interval only impacts groups with more than 2 members, while offloading > only impacts groups with a single member. This is because in the other > cases, the extra delays are folded into existing revocation + heartbeat > delays. Note that share groups have no revocation so always see increased > rebalance times. I've updated the KIP to add the analysis of rebalance > times and propose turning both features on by default. > > dj06: Based on its description, I wonder whether ` >> consumer.min.assignor.interval.ms` should be called ` >> consumer.min.assignment.interval.ms`. What do you think? > > Thanks, I've renamed the config options in the KIP. What about the > assignor.offload.enable configs? > > dj07: It is not possible to enable/disable the offloading at the group >> level. This makes sense to me but it would be great to explain the >> rationale for it in the KIP. > > Thinking about it, there's nothing stopping us from configuring offloading > at the group level. In fact it might be desirable for some users to disable > offloading at the group coordinator level to keep rebalances fast and only > enable it for problematic large groups. I've added a group-level override > to the KIP. > > On Tue, Jan 20, 2026 at 1:38 PM Lianet Magrans <[email protected]> wrote: > >> Hi Sean, thanks for the KIP. >> >> LM1: About group.initial.rebalance.delay.ms, I expect the interaction >> with the interval is just as described for the streams initial delay and >> interval, correct? Should we clarify that in the KIP (it only mentions the >> streams case) >> >> LM2: The KIP refers to batching assignment re-calculations triggered by >> member subscriptions changes, but I expect the batching mechanism applies >> the same when the assignment re-calculation is triggered by metadata >> changes (i.e topic/partition created or deleted), without any HB changing >> subscriptions. Is my understanding correct? >> >> LM3: About this section: "*When there is an in-flight assignor run for >> the group, there is no new target assignment. We will trigger the next >> assignor run on a future heartbeat.*". I expect that the next assignor >> run will be triggered on the next HB from this or from any other member of >> the group, received after the interval expires (without the members >> re-sending the subscription change). Is my expectation correct? If so, >> it may be worth clarifying in the KIP to avoid confusion with client-side >> implementations. >> >> Thanks! >> Lianet >> >> >> >> On Tue, Jan 13, 2026 at 1:23 AM Sean Quah via dev <[email protected]> >> wrote: >> >>> sq01: We also have to update the SyncGroup request handling to only >>> return >>> REBALANCE_IN_PROGRESS when the member's epoch is behind the target >>> assignment epoch, not the group epoch. Thanks to Dongnuo for pointing >>> this >>> out. >>> >>> On Thu, Jan 8, 2026 at 5:40 PM Dongnuo Lyu via dev <[email protected] >>> > >>> wrote: >>> >>> > Hi Sean, thanks for the KIP! I have a few questions as follows. >>> > >>> > dl01: Could we mention the handling when the group metadata or topic >>> > partition metadata is changed or deleted during the async assignor run? >>> > >>> > dl02: This might be a question for the overall coordinator executor - >>> do we >>> > have plans to apply an explicit size limit to the executor queue? If >>> many >>> > groups trigger offloaded assignments simultaneously, should we apply >>> some >>> > backpressure for protection? >>> > >>> > Also resonate with dj05, for small groups default ` >>> > min.assignor.interval.ms` >>> > to 5s might not be necessary, so not sure if we want to make the batch >>> > assignment default. Or it might be good to have a per group enablement. >>> > >>> > Thanks >>> > Dongnuo >>> > >>> >>
