Vinod Kone created MESOS-1866:
---------------------------------
Summary: Race between ~Authenticator() and
Authenticator::authenticate() can lead to schedulers/slaves to never get
authenticated
Key: MESOS-1866
URL: https://issues.apache.org/jira/browse/MESOS-1866
Project: Mesos
Issue Type: Bug
Reporter: Vinod Kone
Assignee: Vinod Kone
Priority: Critical
The master might get a duplicate authenticate() request while a previous
authentication attempt is in progress. Depending on what the
AuthenticatorProcess is executing at the time, there are 2 possible race
conditions which will cause scheduler/slave to continuously retry
authentication but never succeed.
We have seen both the race conditions in a heavily loaded production cluster.
Race1:
----------
--> An authenticate() event was dispatched to AuthenticatorProcess
(Master::authenticate() called Authenticator::authenticate())
--> A terminate() event was then injected into the front of the
AuthenticatorProcess queue (duplicate Master::authenticate() did
~Authenticator) before the above authenticate() event was executed.
--> Due to the bug in libprocess, the future returned by Master::authenticate()
was never transitioned to discarded (Master::_authenticate() was never called).
--> This caused all the subsequent authentication retries to be enqueued on the
master waiting for Master::_authenticate() to be executed.
Fix: Transition the dispatched future to discarded if the libprocess is
terminated (https://reviews.apache.org/r/25945/)
Race 2:
-----------
--> An authenticate() event was dispatched to AuthenticatorProcess
(Master::authenticate() called Authenticator::authenticate())
--> AuthenticatorProcess::authenticate() executed and set
promise.onDiscard(defer(self, Self::discarded)). NOTE: The internal promise of
AuthenticatorProcess is discarded in AuthenticatorProcess::discarded()
--> A terminate() event was then injected into the front of the
AuthenticatorProcess queue (duplicate Master::authenticate() did
~Authenticator) before the above discarded() event was executed)
--> ~AuthenticatorProcess is destructed without ever discarding the internal
promise (Master::_authenticate() was never called).
--> This caused all the subsequent authentication retries to be enqueued on the
master waiting for Master::_authenticate() to be executed.
Fix: The fix here is to discard the internal promise when the
AuthenticatorProcess is destructed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)