Ron Dagostino created KAFKA-7902:
------------------------------------
Summary: SASL/OAUTHBEARER can become unable to connect:
javax.security.sasl.SaslException: Unable to find OAuth Bearer token in
Subject's private credentials (size=2)
Key: KAFKA-7902
URL: https://issues.apache.org/jira/browse/KAFKA-7902
Project: Kafka
Issue Type: Bug
Components: clients
Affects Versions: 2.1.0, 2.0.1, 2.0.0
Reporter: Ron Dagostino
Assignee: Ron Dagostino
It is possible for a Java SASL/OAUTHBEARER client (either a non-broker
producer/consumer client or a broker when acting as an inter-broker client) to
end up in a state where it cannot connect to a new broker (or, if
re-authentication as implemented by KIP-368 and merged for v2.2.0 were to be
deployed and enabled, to be unable to re-authenticate). The error message looks
like this:
{{Connection to node 1 failed authentication due to: An error:
(java.security.PrivilegedActionException: javax.security.sasl.SaslException:
Unable to find OAuth Bearer token in Subject's private credentials (size=2)
[Caused by java.io.IOException: Unable to find OAuth Bearer token in Subject's
private credentials (size=2)]) occurred when evaluating SASL token received
from the Kafka Broker. Kafka Client will go to AUTHENTICATION_FAILED state.}}
The root cause of the problem begins at this point in the code:
[https://github.com/apache/kafka/blob/2.0/clients/src/main/java/org/apache/kafka/common/security/oauthbearer/internals/expiring/ExpiringCredentialRefreshingLogin.java#L378]:
The {{loginContext}} field doesn't get replaced with the old version stored
away in the {{optionalLoginContextToLogout}} variable if/when the
{{loginContext.login()}} call on line 381 throws an exception. *This is an
unusual event* – the OAuth authorization server must be unavailable at the
moment when the token refresh occurs – but when it does happen it puts the
refresher thread instance in an invalid state because now its {{loginContext}}
field represents the one that failed instead of the original one, which is now
lost. The current {{loginContext}} can't be logged out – it will throw an
{{InvalidStateException}} if that is attempted because there is no token
associated with it -- and the token associated with the login context that was
lost can never be logged out and removed from the Subject's private credentials
(because we don't retain a reference to it). The net effect is that we end up
with an extra token on the Subject's private credentials, which eventually
results in the exception mentioned above when the client tries to authenticate
to a broker.
So the chain of events is:
1) login failure upon token refresh causes the refresher thread's login context
field to be incorrect, and the existing token on the Subject's private
credentials will never be logged out/removed
2) retry occurs in 10 seconds, potentially repeatedly until the authorization
server is back online
3) login succeeds, adding a second token to the Subject's private credentials
(logout is then called on the login context set incorrectly in the most recent
failure -- e.g. in step 1 -- which results in an exception, but this is not the
real issue -- it is the 2 tokens on the Subject's private credentials that is
the issue)
4) At this point we now have 2 tokens on the Subject, and then at some point
in the future the client tries to make a new connection, it sees the 2 tokens
and throws an exception – BOOM! The client is now unable to connect (or
re-authenticate if applicable) going forward.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)