Manikumar created KAFKA-19561: --------------------------------- Summary: Request Timeout During SASL Reauthentication Due to Missed OP_WRITE interest set Key: KAFKA-19561 URL: https://issues.apache.org/jira/browse/KAFKA-19561 Project: Kafka Issue Type: Bug Reporter: Manikumar Assignee: Manikumar
We've observed request timeouts occurring during SASL reauthentication, and analysis suggests the issue is caused by a race condition between request handling and reauthentication on the broker side. Here’s the sequence: # Client sends a request ({{{}Req1{}}}) to the broker. # Client begins SASL reauthentication. # Broker receives {{{}Req1{}}}. # Broker also initiates SASL reauthentication. # While reauth is in progress: ** Broker processes {{{}Req1{}}}, prepares {{{}Res1{}}}, and queues it via {{{}KafkaChannel.send(){}}}. ** Broker sets {{SelectionKey.OP_WRITE}} to indicate write readiness. ** However, {{Selector.attemptWrite()}} skips the send because: *** {{channel.hasSend()}} is true, but *** {{channel.ready()}} is false (since reauth is not yet complete). # After reauth completes, broker removes {{OP_WRITE}} from the selection key. # At this point: ** {{Res1}} is still pending in the channel. ** {{channel.hasSend()}} and {{channel.ready()}} are now true, ** But {{key.isWritable()}} is false, so no further write is attempted. 8. The response remains stuck in the send buffer. Client eventually hits a request timeout. The fix is to set write readiness using SelectionKey.OP_WRITE at the end of Step 6. This is similar to [what we do on client side|https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/security/authenticator/SaslClientAuthenticator.java#L422]. -- This message was sent by Atlassian Jira (v8.20.10#820010)