Lucas Brutschy created KAFKA-20673:
--------------------------------------
Summary: AdminClient partition-leader APIs hang when a cached
leader has left the cluster
Key: KAFKA-20673
URL: https://issues.apache.org/jira/browse/KAFKA-20673
Project: Kafka
Issue Type: Task
Affects Versions: 4.1.2, 4.1.1, 4.0.2, 4.3.0, 4.2.0, 4.1.0, 4.0.1, 4.0.0
Reporter: Lucas Brutschy
Assignee: Lucas Brutschy
Fix For: 4.4.0
{{KafkaAdminClient.listOffsets}} — and any other
{{{}PartitionLeaderStrategy{}}}-routed API ({{{}deleteRecords{}}},
{{{}describeProducers{}}}, {{{}abortTransaction{}}}) — can block for the full
{{default.api.timeout.ms}} and then fail with {{TimeoutException: Timed out
waiting for a node assignment}} when the partition-leader cache holds a leader
id that is no longer present in the cluster metadata.
Since the partition-leader cache fast-path was added (KAFKA-17663,
[#17367|https://github.com/apache/kafka/pull/17367]), the {{AdminApiDriver}}
constructor reads {{future.cachedKeyBrokerIdMapping()}} and, for any cached
entry, routes the key straight into the fulfillment stage under a
{{FulfillmentScope(brokerId)}} — skipping the lookup stage. The resulting
{{Call}} is given a {{{}ConstantNodeIdProvider(brokerId){}}}.
If that cached {{brokerId}} is no longer in
{{{}AdminMetadataManager.cluster().nodes(){}}}, the call gets stuck:
* {{ConstantNodeIdProvider.provide()}} does {{nodeById(id)}} → {{{}null{}}},
calls {{{}metadataManager.requestUpdate(){}}}, and returns {{{}null{}}}.
* {{maybeDrainPendingCall}} sees {{null}} and leaves the call in
{{{}pendingCalls{}}}.
* The only ways out of {{fulfillmentMap}} are {{unmap()}} via {{retryLookup}}
(driven by an {{{}onResponse{}}}/{{{}onFailure{}}} for a request that was
actually {_}sent{_}) or a {{{}DisconnectException{}}}. Neither fires for a call
that is never sent.
So the call just spins broker-info ({{{}topics=[]{}}}) metadata refreshes —
which never re-resolve the partition leader — until the request deadline
expires.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)