[ https://issues.apache.org/jira/browse/KAFKA-19561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Manikumar updated KAFKA-19561: ------------------------------ Fix Version/s: 3.9.2 4.2.0 > Request Timeout During SASL Reauthentication Due to Missed OP_WRITE interest > set > ---------------------------------------------------------------------------------- > > Key: KAFKA-19561 > URL: https://issues.apache.org/jira/browse/KAFKA-19561 > Project: Kafka > Issue Type: Bug > Reporter: Manikumar > Assignee: Manikumar > Priority: Major > Fix For: 3.9.2, 4.2.0 > > > We've observed request timeouts occurring during SASL reauthentication, and > analysis suggests the issue is caused by a race condition between request > handling and SASL reauthentication on the broker side. Here’s the sequence: > # Client sends a request (Req1) to the broker. > # Client initiates SASL reauthentication. > # Broker receives Req1. > # Broker also begins SASL reauthentication. > # While reauth is in progress: > ** Broker completes processing of Req1 and prepares a response (Res1). > ** Res1 is queued via KafkaChannel.send(). > ** Broker sets SelectionKey.OP_WRITE to indicate write readiness. > ** However, Selector.attemptWrite() does not proceed because: > *** channel.hasSend() is true, but > *** channel.ready() is false (reauth is still in progress). > # Once reauthentication completes: Broker removes SelectionKey.OP_WRITE. > # At this point: > ** channel.hasSend() and channel.ready() are now true, > ** But key.isWritable() is false, so the response (Res1) is never sent. > # The response remains stuck in the send buffer. Client eventually hits a > request timeout. > The fix is to set write readiness using SelectionKey.OP_WRITE at the end of > Step 6. This is similar to [what we do on client > side|https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/security/authenticator/SaslClientAuthenticator.java#L422]. -- This message was sent by Atlassian Jira (v8.20.10#820010)