[
https://issues.apache.org/jira/browse/MESOS-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dominic Hamon updated MESOS-1866:
---------------------------------
Sprint: Mesos Q3 Sprint 7
> Race between ~Authenticator() and Authenticator::authenticate() can lead to
> schedulers/slaves to never get authenticated
> ------------------------------------------------------------------------------------------------------------------------
>
> Key: MESOS-1866
> URL: https://issues.apache.org/jira/browse/MESOS-1866
> Project: Mesos
> Issue Type: Bug
> Reporter: Vinod Kone
> Assignee: Vinod Kone
> Priority: Critical
>
> The master might get a duplicate authenticate() request while a previous
> authentication attempt is in progress. Depending on what the
> AuthenticatorProcess is executing at the time, there are 2 possible race
> conditions which will cause scheduler/slave to continuously retry
> authentication but never succeed.
> We have seen both the race conditions in a heavily loaded production cluster.
> Race1:
> ----------
> --> An authenticate() event was dispatched to AuthenticatorProcess
> (Master::authenticate() called Authenticator::authenticate())
> --> A terminate() event was then injected into the front of the
> AuthenticatorProcess queue (duplicate Master::authenticate() did
> ~Authenticator) before the above authenticate() event was executed.
> --> Due to the bug in libprocess, the future returned by
> Master::authenticate() was never transitioned to discarded
> (Master::_authenticate() was never called).
> --> This caused all the subsequent authentication retries to be enqueued on
> the master waiting for Master::_authenticate() to be executed.
> Fix: Transition the dispatched future to discarded if the libprocess is
> terminated (https://reviews.apache.org/r/25945/)
> Race 2:
> -----------
> --> An authenticate() event was dispatched to AuthenticatorProcess
> (Master::authenticate() called Authenticator::authenticate())
> --> AuthenticatorProcess::authenticate() executed and set
> promise.onDiscard(defer(self, Self::discarded)). NOTE: The internal promise
> of AuthenticatorProcess is discarded in AuthenticatorProcess::discarded()
> --> A terminate() event was then injected into the front of the
> AuthenticatorProcess queue (duplicate Master::authenticate() did
> ~Authenticator) before the above discarded() event was executed)
> --> ~AuthenticatorProcess is destructed without ever discarding the internal
> promise (Master::_authenticate() was never called).
> --> This caused all the subsequent authentication retries to be enqueued on
> the master waiting for Master::_authenticate() to be executed.
> Fix: The fix here is to discard the internal promise when the
> AuthenticatorProcess is destructed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)