Kirk True created KAFKA-16555:
---------------------------------
Summary: Consumer's RequestState has incorrect logic to determine
if inflight
Key: KAFKA-16555
URL: https://issues.apache.org/jira/browse/KAFKA-16555
Project: Kafka
Issue Type: Task
Components: clients, consumer
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
Fix For: 3.8.0
When running system tests for the new consumer, I've hit an issue where the
{{HeartbeatRequestManager}} is sending out multiple concurrent
{{CONSUMER_GROUP_REQUEST}} RPCs. The effect is the coordinator creates multiple
members which causes downstream assignment problems.
Here's the order of events:
* Time 202: {{HearbeatRequestManager.poll()}} determines it's OK to send a
request. In so doing, it updates the {{RequestState}}'s {{lastSentMs}} to the
current timestamp, 202
* Time 236: the response is received and response handler is invoked, setting
the {{RequestState}}'s {{lastReceivedMs}} to the current timestamp, 236
* Time 236: {{HearbeatRequestManager.poll()}} is invoked again, and it sees
that it's OK to send a request. It creates another request, once again updating
the {{RequestState}}'s {{lastSentMs}} to the current timestamp, 236
* Time 237: {{HearbeatRequestManager.poll()}} is invoked again, and
ERRONEOUSLY decides it's OK to send another request, despite one already in
flight.
Here's the problem with {{requestInFlight()}}:
{code:java}
public boolean requestInFlight() {
return this.lastSentMs > -1 && this.lastReceivedMs < this.lastSentMs;
}
{code}
On our case, {{lastReceivedMs}} is 236 and {{lastSentMs}} is _also_ 236. So the
received timestamp is _equal_ to the sent timestamp, not _less_.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)