[
https://issues.apache.org/jira/browse/KAFKA-19561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Manikumar resolved KAFKA-19561.
-------------------------------
Resolution: Fixed
> Request Timeout During SASL Reauthentication Due to Missed OP_WRITE interest
> set
> ----------------------------------------------------------------------------------
>
> Key: KAFKA-19561
> URL: https://issues.apache.org/jira/browse/KAFKA-19561
> Project: Kafka
> Issue Type: Bug
> Reporter: Manikumar
> Assignee: Manikumar
> Priority: Major
> Fix For: 3.9.2, 4.2.0, 4.1.2
>
>
> We've observed request timeouts occurring during SASL reauthentication, and
> analysis suggests the issue is caused by a race condition between request
> handling and SASL reauthentication on the broker side. Here’s the sequence:
> # Client sends a request (Req1) to the broker.
> # Client initiates SASL reauthentication.
> # Broker receives Req1.
> # Broker also begins SASL reauthentication.
> # While reauth is in progress:
> ** Broker completes processing of Req1 and prepares a response (Res1).
> ** Res1 is queued via KafkaChannel.send().
> ** Broker sets SelectionKey.OP_WRITE to indicate write readiness.
> ** However, Selector.attemptWrite() does not proceed because:
> *** channel.hasSend() is true, but
> *** channel.ready() is false (reauth is still in progress).
> # Once reauthentication completes: Broker removes SelectionKey.OP_WRITE.
> # At this point:
> ** channel.hasSend() and channel.ready() are now true,
> ** But key.isWritable() is false, so the response (Res1) is never sent.
> # The response remains stuck in the send buffer. Client eventually hits a
> request timeout.
> The fix is to set write readiness using SelectionKey.OP_WRITE at the end of
> Step 6. This is similar to [what we do on client
> side|https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/security/authenticator/SaslClientAuthenticator.java#L422].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)