[ https://issues.apache.org/jira/browse/KAFKA-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866468#comment-17866468 ]
TengYao Chi commented on KAFKA-17116: ------------------------------------- Hi [~lianetm] I have do some research, here is my understanding: The root cause is that the broker receives the subscribe heartbeat and assigns a memberId to the consumer. However, the consumer sends a close heartbeat before receiving the assigned memberId. Consequently, the broker receives a close heartbeat with an invalid memberId (an empty string), making it unable to identify which consumer wants to leave. As a result, the broker has to wait until the heartbeat timeout to rebalance the group. ---- I have simple idea which is to add a check at the {{heartbeatNow}} stage to see if a memberId has already been assigned. Only if a memberId is present should a close heartbeat be sent. What do you think about this approach? Btw, I wonder if I could reproduce the issue on my machine. It doesn't seem that easy to do. :P > New consumer may not send effective leave group if member ID received after > close > ---------------------------------------------------------------------------------- > > Key: KAFKA-17116 > URL: https://issues.apache.org/jira/browse/KAFKA-17116 > Project: Kafka > Issue Type: Bug > Components: clients, consumer > Affects Versions: 3.8.0 > Reporter: Lianet Magrans > Assignee: TengYao Chi > Priority: Major > Labels: kip-848-client-support > Fix For: 3.9.0 > > > If the new consumer is closed after sending a HB to join, but before > receiving the response to it, it will send a leave group request but without > member ID (will simply fail with UNKNOWN_MEMBER_ID). This will make that the > broker will have a registered new member, for which it will never receive a > leave request for it. > # consumer.subscribe -> sends HB to join, transitions to JOINING > # consumer.close -> will transition to LEAVING and send HB with epoch -1 > (without waiting for in-flight requests) > # consumer receives response to initial HB, containing the assigned member > ID. It will simply ignore it because it's not in the group anymore > (UNSUBSCRIBED) > Note that the expectation, with the current logic, and main downsides of this > are: > # If the case was that the member received partitions on the first HB, those > partitions won't be re-assigned (broker waiting for the closed consumer to > reconcile them), until the rebalance timeout expires. > # Even if no partitions were assigned to it, the member will remain in the > group from the broker point of view (but not from the client POV). The member > will be eventually kicked out for not sending HBs, but only when it's session > timeout expires. -- This message was sent by Atlassian Jira (v8.20.10#820010)