[
https://issues.apache.org/jira/browse/KAFKA-17115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864812#comment-17864812
]
Lianet Magrans commented on KAFKA-17115:
----------------------------------------
Hey [~ChrisEgerton] , even though on the new consumer we don't have this same
flow (no member ID required to join), I believe we would have the same gap you
pointed out:
# new consumer joins -> sends HB with epoch 0 (no member ID)
# consumer closed -> this will still generate the HB to leave (epoch -1), but
with no member ID because it does not have one yet, so not able to be processed
correctly by the broker anyways
# consumer receives response to initial HB to join (response containing member
ID)
So yes, we would end up in a similar situation (less bad only because with the
new protocol and no global barrier, we wouldn't have a blocked rebalance, just
a member that is registered in the group and may have received partitions that
won't be re-assigned until the rebalance timeout expires and the broker gives
the partitions to someone else. The member would be kicked out of the group
when its session expires.
I will file a separate Jira to review and fix this edge case with the new
consumer. Thanks!
> Closing newly-created consumers during rebalance can cause rebalances to hang
> -----------------------------------------------------------------------------
>
> Key: KAFKA-17115
> URL: https://issues.apache.org/jira/browse/KAFKA-17115
> Project: Kafka
> Issue Type: Bug
> Components: consumer
> Affects Versions: 3.9.0
> Reporter: Chris Egerton
> Assignee: Chris Egerton
> Priority: Major
>
> When a dynamic consumer (i.e., one with no group instance ID configured)
> first tries to join a group, the group coordinator normally responds with the
> MEMBER_ID_REQUIRED error, under the assumption that the member will retry
> soon after. During this step, the group coordinator will also generate a new
> member ID for the consumer, include it in the error response for the initial
> join group request, and expect that a member with that ID will participate in
> future rebalances.
> If a consumer is closed in between the time that it sends the JoinGroup
> request and the time that it receives the response from the group
> coordinator, it will not attempt to leave the group, since it doesn't have a
> member ID to include in that request.
> This will cause future rebalances to hang, since the group coordinator will
> still expect a member with the ID for the now-closed consumer to join.
> Eventually, the group coordinator may remove the closed consumer from the
> group, but with default configuration settings, this could take as long as
> five minutes.
> One possible fix is to send a LeaveGroup response with the member ID if the
> consumer receives a JoinGroup response with a member ID after it has been
> closed.
>
> This applies to the legacy consumer; I have not verified yet with the new
> async consumer.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)