Manikumar created KAFKA-19561:
---------------------------------

             Summary: Request Timeout During SASL Reauthentication Due to 
Missed OP_WRITE  interest set 
                 Key: KAFKA-19561
                 URL: https://issues.apache.org/jira/browse/KAFKA-19561
             Project: Kafka
          Issue Type: Bug
            Reporter: Manikumar
            Assignee: Manikumar


We've observed request timeouts occurring during SASL reauthentication, and 
analysis suggests the issue is caused by a race condition between request 
handling and reauthentication on the broker side. Here’s the sequence:


 # Client sends a request ({{{}Req1{}}}) to the broker.

 # Client begins SASL reauthentication.

 # Broker receives {{{}Req1{}}}.

 # Broker also initiates SASL reauthentication.

 # While reauth is in progress:

 ** Broker processes {{{}Req1{}}}, prepares {{{}Res1{}}}, and queues it via 
{{{}KafkaChannel.send(){}}}.

 ** Broker sets {{SelectionKey.OP_WRITE}} to indicate write readiness.

 ** However, {{Selector.attemptWrite()}} skips the send because:

 *** {{channel.hasSend()}} is true, but

 *** {{channel.ready()}} is false (since reauth is not yet complete).

 # After reauth completes, broker removes {{OP_WRITE}} from the selection key.

 # At this point:

 ** {{Res1}} is still pending in the channel.

 ** {{channel.hasSend()}} and {{channel.ready()}} are now true,

 ** But {{key.isWritable()}} is false, so no further write is attempted.

       8. The response remains stuck in the send buffer. Client eventually hits 
a request timeout.


The fix is to set write readiness using SelectionKey.OP_WRITE at the end of 
Step 6. This is similar to [what we do on client 
side|https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/security/authenticator/SaslClientAuthenticator.java#L422].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to