Chris Egerton created KAFKA-17115:
-------------------------------------
Summary: Closing newly-created consumers during rebalance can
cause rebalances to hang
Key: KAFKA-17115
URL: https://issues.apache.org/jira/browse/KAFKA-17115
Project: Kafka
Issue Type: Bug
Components: consumer
Affects Versions: 3.9.0
Reporter: Chris Egerton
Assignee: Chris Egerton
When a dynamic consumer (i.e., one with no group instance ID configured) first
tries to join a group, the group coordinator normally responds with the
MEMBER_ID_REQUIRED error, under the assumption that the member will retry soon
after. During this step, the group coordinator will also generate a new member
ID for the consumer, include it in the error response for the initial join
group request, and expect that a member with that ID will participate in future
rebalances.
If a consumer is closed in between the time that it sends the JoinGroup request
and the time that it receives the response from the group coordinator, it will
not attempt to leave the group, since it doesn't have a member ID to include in
that request.
This will cause future rebalances to hang, since the group coordinator will
still expect a member with the ID for the now-closed consumer to join.
Eventually, the group coordinator may remove the closed consumer from the
group, but with default configuration settings, this could take as long as five
minutes.
One possible fix is to send a LeaveGroup response with the member ID if the
consumer receives a JoinGroup response with a member ID after it has been
closed.
This applies to the legacy consumer; I have not verified yet with the new async
consumer.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)