[ https://issues.apache.org/jira/browse/KAFKA-17170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lianet Magrans resolved KAFKA-17170. ------------------------------------ Fix Version/s: 4.0.0 Reviewer: Lianet Magrans Resolution: Fixed > Add test to ensure new consumer acks reconciled assignment even if first HB > with ack lost > ----------------------------------------------------------------------------------------- > > Key: KAFKA-17170 > URL: https://issues.apache.org/jira/browse/KAFKA-17170 > Project: Kafka > Issue Type: Task > Components: clients, consumer > Reporter: Lianet Magrans > Assignee: 黃竣陽 > Priority: Minor > Labels: kip-848-client-support, newbie > Fix For: 4.0.0 > > > When a consumer reconciles an assignment, it transitions to ACKNOWLEDGING, so > that a HB is sent on the next manager poll, without waiting for the interval. > The consumer transitions out of this ack state as soon as it sends the > heartbeat, without waiting for a response. This is based on the expectation > that following heartbeats (sent on the interval) will act as ack, including > the set of partitions even in case the first ack is lost. This is the > expected flow: > # complete reconciliation and send HB1 to ack assignment tp0 > # HB1 times out (or fails in any way) => heartbeat request manager resets > the sentFields to null (HeartbeatState.reset() , triggered if the request > fails, or if it gets a response with an Error) > # following HB will include tp0 (and act as ack), because it will notice > that tp0 != null (last value sent) > This seems not to be covered by any test, so we should add a unit test to the > HeartbeatRequestManager, to ensure that the HB generated in step 4 above > includes tp0 as I expect :), considering both cases of error: request fails > (no response) and request gets a response with an Error in it. > This flow is important because if failing to send the reconciled partitions > in a HB, the broker would remain waiting for an ack that the member would > considered it already sent (the broker would wait for the rebalance timeout > before re-assigning those partitions) -- This message was sent by Atlassian Jira (v8.20.10#820010)