Vinod Kone created MESOS-1866:
---------------------------------

             Summary: Race between ~Authenticator() and 
Authenticator::authenticate() can lead to schedulers/slaves to never get 
authenticated
                 Key: MESOS-1866
                 URL: https://issues.apache.org/jira/browse/MESOS-1866
             Project: Mesos
          Issue Type: Bug
            Reporter: Vinod Kone
            Assignee: Vinod Kone
            Priority: Critical


The master might get a duplicate authenticate() request while a previous 
authentication attempt is in progress. Depending on what the 
AuthenticatorProcess is executing at the time, there are 2 possible race 
conditions which will cause scheduler/slave to continuously retry 
authentication but never succeed.

We have seen both the race conditions in a heavily loaded production cluster.

Race1:
----------
--> An authenticate() event was dispatched to AuthenticatorProcess 
(Master::authenticate() called Authenticator::authenticate())

--> A terminate() event was then injected into the front of the 
AuthenticatorProcess queue (duplicate Master::authenticate() did 
~Authenticator) before the above authenticate() event was executed.

--> Due to the bug in libprocess, the future returned by Master::authenticate() 
was never transitioned to discarded (Master::_authenticate() was never called).

--> This caused all the subsequent authentication retries to be enqueued on the 
master waiting for Master::_authenticate() to be executed.

Fix: Transition the dispatched future to discarded if the libprocess is 
terminated (https://reviews.apache.org/r/25945/)

Race 2:
-----------
--> An authenticate() event was dispatched to AuthenticatorProcess 
(Master::authenticate() called Authenticator::authenticate())

--> AuthenticatorProcess::authenticate() executed and set 
promise.onDiscard(defer(self, Self::discarded)). NOTE: The internal promise of 
AuthenticatorProcess is discarded in AuthenticatorProcess::discarded()

--> A terminate() event was then injected into the front of the 
AuthenticatorProcess queue (duplicate Master::authenticate() did 
~Authenticator) before the above discarded() event was executed)

--> ~AuthenticatorProcess is destructed without ever discarding the internal 
promise (Master::_authenticate() was never called).

--> This caused all the subsequent authentication retries to be enqueued on the 
master waiting for Master::_authenticate() to be executed.

Fix: The fix here is to discard the internal promise when the 
AuthenticatorProcess is destructed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to