Ron Dagostino created KAFKA-7902:
------------------------------------

             Summary: SASL/OAUTHBEARER can become unable to connect: 
javax.security.sasl.SaslException: Unable to find OAuth Bearer token in 
Subject's private credentials (size=2) 
                 Key: KAFKA-7902
                 URL: https://issues.apache.org/jira/browse/KAFKA-7902
             Project: Kafka
          Issue Type: Bug
          Components: clients
    Affects Versions: 2.1.0, 2.0.1, 2.0.0
            Reporter: Ron Dagostino
            Assignee: Ron Dagostino


It is possible for a Java SASL/OAUTHBEARER client (either a non-broker 
producer/consumer client or a broker when acting as an inter-broker client) to 
end up in a state where it cannot connect to a new broker (or, if 
re-authentication as implemented by KIP-368 and merged for v2.2.0 were to be 
deployed and enabled, to be unable to re-authenticate). The error message looks 
like this:

{{Connection to node 1 failed authentication due to: An error: 
(java.security.PrivilegedActionException: javax.security.sasl.SaslException: 
Unable to find OAuth Bearer token in Subject's private credentials (size=2) 
[Caused by java.io.IOException: Unable to find OAuth Bearer token in Subject's 
private credentials (size=2)]) occurred when evaluating SASL token received 
from the Kafka Broker. Kafka Client will go to AUTHENTICATION_FAILED state.}}

The root cause of the problem begins at this point in the code:

[https://github.com/apache/kafka/blob/2.0/clients/src/main/java/org/apache/kafka/common/security/oauthbearer/internals/expiring/ExpiringCredentialRefreshingLogin.java#L378]:

The {{loginContext}} field doesn't get replaced with the old version stored 
away in the {{optionalLoginContextToLogout}} variable if/when the 
{{loginContext.login()}} call on line 381 throws an exception. *This is an 
unusual event* – the OAuth authorization server must be unavailable at the 
moment when the token refresh occurs – but when it does happen it puts the 
refresher thread instance in an invalid state because now its {{loginContext}} 
field represents the one that failed instead of the original one, which is now 
lost.  The current {{loginContext}} can't be logged out – it will throw an 
{{InvalidStateException}} if that is attempted because there is no token 
associated with it -- and the token associated with the login context that was 
lost can never be logged out and removed from the Subject's private credentials 
(because we don't retain a reference to it).  The net effect is that we end up 
with an extra token on the Subject's private credentials, which eventually 
results in the exception mentioned above when the client tries to authenticate 
to a broker.

So the chain of events is:

1) login failure upon token refresh causes the refresher thread's login context 
field to be incorrect, and the existing token on the Subject's private 
credentials will never be logged out/removed
 2) retry occurs in 10 seconds, potentially repeatedly until the authorization 
server is back online
 3) login succeeds, adding a second token to the Subject's private credentials 
(logout is then called on the login context set incorrectly in the most recent 
failure -- e.g. in step 1 -- which results in an exception, but this is not the 
real issue -- it is the 2 tokens on the Subject's private credentials that is 
the issue)
 4) At this point we now have 2 tokens on the Subject, and then at some point 
in the future the client tries to make a new connection, it sees the 2 tokens 
and throws an exception – BOOM! The client is now unable to connect (or 
re-authenticate if applicable) going forward.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to