Re: [DISCUSS] KIP-1263: Group Coordinator Assignment Batching and Offload

Sean Quah via dev Thu, 22 Jan 2026 02:58:24 -0800

Hi all, thanks for the feedback so far.

dj01: In the proposed changes section, you state that the timestamp of the
> last assignment is not persisted. How do you plan to bookkeep it if it is
> not stored with the assignment? Intuitively, I would add a timestamp to the
> assignment record.

Thinking about it, it's easier to add it to the assignment record. I will
update the KIP. One thing to note is that the timestamp will be subject to
rollbacks when writing to the log fails, so we can allow extra assignment
runs when that happens.

dj02: I wonder whether we should also add a "thread idle ratio" metric for
> the group coordinator executor. What do you think?

I think it could be useful so I've added it to the KIP. The implementation
will have to be different to the event processor, since we currently use an
ExecutorService.

dj03: If the executor is not used by the share coordinator, it should not
> expose any metrics about it. Is it possible to remove them?

I've removed them from the KIP. We can add a parameter to the coordinator
metrics class to control whether they are visible.

dj04: Is having one group coordinator executor thread sufficient by default
> for common workloads?

Yes and no. I expect it will be very difficult to overload an entire
thread, ie. submit work faster than it can complete it. But updating the
default to two threads could be good for reducing delays due to
simultaneous assignor runs. I've raised the default to 2 threads.

dj05: It seems you propose enabling the minimum assignor interval with a
> default of 5 seconds. However, the offloading is not enabled by default. Is
> the first one enough to guarantee the stability of the group coordinator?
> How do you foresee enabling the second one in the future? It would be great
> if you could address this in the KIP. We need a clear motivation for
> changing the default behavior and a plan for the future.

I initially thought that offloading would increase rebalance times by 1
heartbeat and so didn't propose turning it on by default. But after some
more thinking, I believe both features will increase rebalance times by 1
heartbeat interval and the increase shouldn't stack. The minimum assignor
interval only impacts groups with more than 2 members, while offloading
only impacts groups with a single member. This is because in the other
cases, the extra delays are folded into existing revocation + heartbeat
delays. Note that share groups have no revocation so always see increased
rebalance times. I've updated the KIP to add the analysis of rebalance
times and propose turning both features on by default.

dj06: Based on its description, I wonder whether `
> consumer.min.assignor.interval.ms` should be called `
> consumer.min.assignment.interval.ms`. What do you think?

Thanks, I've renamed the config options in the KIP. What about the
assignor.offload.enable configs?

dj07: It is not possible to enable/disable the offloading at the group
> level. This makes sense to me but it would be great to explain the
> rationale for it in the KIP.

Thinking about it, there's nothing stopping us from configuring offloading
at the group level. In fact it might be desirable for some users to disable
offloading at the group coordinator level to keep rebalances fast and only
enable it for problematic large groups. I've added a group-level override
to the KIP.

On Tue, Jan 20, 2026 at 1:38 PM Lianet Magrans <[email protected]> wrote:

> Hi Sean, thanks for the KIP.
>
> LM1: About group.initial.rebalance.delay.ms, I expect the interaction
> with the interval is just as described for the streams initial delay and
> interval, correct? Should we clarify that in the KIP (it only mentions the
> streams case)
>
> LM2: The KIP refers to batching assignment re-calculations triggered by
> member subscriptions changes, but I expect the batching mechanism applies
> the same when the assignment re-calculation is triggered by metadata
> changes (i.e topic/partition created or deleted), without any HB changing
> subscriptions. Is my understanding correct?
>
> LM3: About this section: "*When there is an in-flight assignor run for
> the group, there is no new target assignment. We will trigger the next
> assignor run on a future heartbeat.*". I expect that the next assignor
> run will be triggered on the next HB from this or from any other member of
> the group, received after the interval expires (without the members
> re-sending the subscription change). Is my expectation correct? If so,
> it may be worth clarifying in the KIP to avoid confusion with client-side
> implementations.
>
> Thanks!
> Lianet
>
>
>
> On Tue, Jan 13, 2026 at 1:23 AM Sean Quah via dev <[email protected]>
> wrote:
>
>> sq01: We also have to update the SyncGroup request handling to only return
>> REBALANCE_IN_PROGRESS when the member's epoch is behind the target
>> assignment epoch, not the group epoch. Thanks to Dongnuo for pointing this
>> out.
>>
>> On Thu, Jan 8, 2026 at 5:40 PM Dongnuo Lyu via dev <[email protected]>
>> wrote:
>>
>> > Hi Sean, thanks for the KIP! I have a few questions as follows.
>> >
>> > dl01: Could we mention the handling when the group metadata or topic
>> > partition metadata is changed or deleted during the async assignor run?
>> >
>> > dl02: This might be a question for the overall coordinator executor -
>> do we
>> > have plans to apply an explicit size limit to the executor queue? If
>> many
>> > groups trigger offloaded assignments simultaneously, should we apply
>> some
>> > backpressure for protection?
>> >
>> > Also resonate with dj05, for small groups default `
>> > min.assignor.interval.ms`
>> > to 5s might not be necessary, so not sure if we want to make the batch
>> > assignment default. Or it might be good to have a per group enablement.
>> >
>> > Thanks
>> > Dongnuo
>> >
>>
>

Re: [DISCUSS] KIP-1263: Group Coordinator Assignment Batching and Offload

Reply via email to