[ 
https://issues.apache.org/jira/browse/MESOS-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1866:
---------------------------------
    Sprint: Mesos Q3 Sprint 7

> Race between ~Authenticator() and Authenticator::authenticate() can lead to 
> schedulers/slaves to never get authenticated
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MESOS-1866
>                 URL: https://issues.apache.org/jira/browse/MESOS-1866
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Vinod Kone
>            Assignee: Vinod Kone
>            Priority: Critical
>
> The master might get a duplicate authenticate() request while a previous 
> authentication attempt is in progress. Depending on what the 
> AuthenticatorProcess is executing at the time, there are 2 possible race 
> conditions which will cause scheduler/slave to continuously retry 
> authentication but never succeed.
> We have seen both the race conditions in a heavily loaded production cluster.
> Race1:
> ----------
> --> An authenticate() event was dispatched to AuthenticatorProcess 
> (Master::authenticate() called Authenticator::authenticate())
> --> A terminate() event was then injected into the front of the 
> AuthenticatorProcess queue (duplicate Master::authenticate() did 
> ~Authenticator) before the above authenticate() event was executed.
> --> Due to the bug in libprocess, the future returned by 
> Master::authenticate() was never transitioned to discarded 
> (Master::_authenticate() was never called).
> --> This caused all the subsequent authentication retries to be enqueued on 
> the master waiting for Master::_authenticate() to be executed.
> Fix: Transition the dispatched future to discarded if the libprocess is 
> terminated (https://reviews.apache.org/r/25945/)
> Race 2:
> -----------
> --> An authenticate() event was dispatched to AuthenticatorProcess 
> (Master::authenticate() called Authenticator::authenticate())
> --> AuthenticatorProcess::authenticate() executed and set 
> promise.onDiscard(defer(self, Self::discarded)). NOTE: The internal promise 
> of AuthenticatorProcess is discarded in AuthenticatorProcess::discarded()
> --> A terminate() event was then injected into the front of the 
> AuthenticatorProcess queue (duplicate Master::authenticate() did 
> ~Authenticator) before the above discarded() event was executed)
> --> ~AuthenticatorProcess is destructed without ever discarding the internal 
> promise (Master::_authenticate() was never called).
> --> This caused all the subsequent authentication retries to be enqueued on 
> the master waiting for Master::_authenticate() to be executed.
> Fix: The fix here is to discard the internal promise when the 
> AuthenticatorProcess is destructed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to