Neil Conway created MESOS-6869:
----------------------------------
Summary: Master should not shutdown agents that re-register
without authenticating
Key: MESOS-6869
URL: https://issues.apache.org/jira/browse/MESOS-6869
Project: Mesos
Issue Type: Bug
Components: master
Reporter: Neil Conway
Priority: Minor
Currently, the master sends a {{ShutdownMessage}} to the agent if the agent
attempts to register or re-register without first authenticating:
https://github.com/apache/mesos/blob/be127e6eca1312bf8c2b039646f6909fa42cd342/src/master/master.cpp#L5083
https://github.com/apache/mesos/blob/be127e6eca1312bf8c2b039646f6909fa42cd342/src/master/master.cpp#L5268
This produces bad behavior in this scenario:
* Agent authenticates successfully
* Agent's registration / re-registration message is dropped. Agent will then
attempt to register / re-register again in a loop w/ backoff.
* Master eventually marks the agent unreachable because it fails health checks.
This clears the {{authenticated}} state for the agent.
* Simultaneously, a re-registration attempt from the agent reaches the master.
The master responds with {{ShutdownMessage}}, which is bad because it will
terminate all of its tasks.
Instead, the master should ignore the registration attempt -- however, that is
still problematic, because (a) the agent will never attempt to re-authenticate
because it thinks it has already authenticated, (b) the agent will continue to
respond to pings from the master and will not clear its local "I have
authenticated" state.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)