[ 
https://issues.apache.org/jira/browse/KAFKA-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866468#comment-17866468
 ] 

TengYao Chi commented on KAFKA-17116:
-------------------------------------

Hi [~lianetm] 
I have do some research, here is my understanding:

The root cause is that the broker receives the subscribe heartbeat and assigns 
a memberId to the consumer. However, the consumer sends a close heartbeat 
before receiving the assigned memberId. Consequently, the broker receives a 
close heartbeat with an invalid memberId (an empty string), making it unable to 
identify which consumer wants to leave. As a result, the broker has to wait 
until the heartbeat timeout to rebalance the group.
----
I have simple idea which is to add a check at the {{heartbeatNow}} stage to see 
if a memberId has already been assigned. Only if a memberId is present should a 
close heartbeat be sent.

What do you think about this approach?

Btw,  I wonder if I could reproduce the issue on my machine. It doesn't seem 
that easy to do. :P

> New consumer may not send effective leave group if member ID received after 
> close 
> ----------------------------------------------------------------------------------
>
>                 Key: KAFKA-17116
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17116
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, consumer
>    Affects Versions: 3.8.0
>            Reporter: Lianet Magrans
>            Assignee: TengYao Chi
>            Priority: Major
>              Labels: kip-848-client-support
>             Fix For: 3.9.0
>
>
> If the new consumer is closed after sending a HB to join, but before 
> receiving the response to it, it will send a leave group request but without 
> member ID (will simply fail with UNKNOWN_MEMBER_ID). This will make that the 
> broker will have a registered new member, for which it will never receive a 
> leave request for it.
>  # consumer.subscribe -> sends HB to join, transitions to JOINING
>  # consumer.close -> will transition to LEAVING and send HB with epoch -1 
> (without waiting for in-flight requests)
>  # consumer receives response to initial HB, containing the assigned member 
> ID. It will simply ignore it because it's not in the group anymore 
> (UNSUBSCRIBED)
> Note that the expectation, with the current logic, and main downsides of this 
> are:
>  # If the case was that the member received partitions on the first HB, those 
> partitions won't be re-assigned (broker waiting for the closed consumer to 
> reconcile them), until the rebalance timeout expires. 
>  # Even if no partitions were assigned to it, the member will remain in the 
> group from the broker point of view (but not from the client POV). The member 
> will be eventually kicked out for not sending HBs, but only when it's session 
> timeout expires.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to