Konstantine Karantasis created KAFKA-9848:
---------------------------------------------
Summary: Avoid triggering scheduled rebalance delay when task
assignment fails but Connect workers remain in the group
Key: KAFKA-9848
URL: https://issues.apache.org/jira/browse/KAFKA-9848
Project: Kafka
Issue Type: Bug
Components: KafkaConnect
Affects Versions: 2.4.1, 2.3.1, 2.5.0
Reporter: Konstantine Karantasis
Assignee: Konstantine Karantasis
There are cases where a Connect worker does not receive its tasks assignments
successfully after a rebalance but will still remain in the group. For example
when a SyncGroup response is lost, a worker will not get its expected
assignments but will rejoin the group immediately and will trigger another
rebalance.
With incremental cooperative rebalancing, tasks assignments that are computed
and sent by the leader but are not received by any of the members are marked as
lost assignments in the subsequent rebalance. The presence of lost assignments
activates the scheduled rebalance delay (property) and the missing tasks are
not assigned until this delay expires.
This situation can be improved in two cases:
a) When it's the leader that failed to receive the new assignments from the
broker coordinator (for example if the SyncGroup request or response was lost).
If this worker remains the leader of the group in the subsequent rebalance
round, it can detect that the previous assignment was not successfully applied
by checking what's the expected generation.
b) If one or more regular members did not receive their assignments
successfully, but have joined the latest round of rebalancing, they can be
assigned the tasks that remain unassigned from the previous assignment
immediately without these tasks being marked as lost. The leader can detect
that by checking that some tasks seem lost since the previous assignment but
also the number of workers is unchanged between the two rounds of rebalancing.
In this case, the leader can go ahead and assign the missing tasks as new tasks
immediately.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)