[
https://issues.apache.org/jira/browse/MESOS-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adam B updated MESOS-7215:
--------------------------
Target Version/s: 1.2.3, 1.3.2, 1.4.0 (was: 1.2.2, 1.3.2, 1.4.0)
> Race condition on re-registration of non-partition-aware frameworks
> -------------------------------------------------------------------
>
> Key: MESOS-7215
> URL: https://issues.apache.org/jira/browse/MESOS-7215
> Project: Mesos
> Issue Type: Bug
> Affects Versions: 1.2.0
> Reporter: Yan Xu
> Assignee: Megha Sharma
> Priority: Critical
>
> Prior to the partition-awareness work MESOS-5344, upon agent reregistration
> after it has been removed, the master only sends ShutdownFrameworkMessages to
> the agent for frameworks that it knows have been torn down.
> With the new logic in MESOS-5344, Mesos is now sending
> {{ShutdownFrameworkMessages}} to the agent for all non-partition-aware
> frameworks (including the ones that are still registered)
> This is problematic. The offer from this agent can still go to the same
> framework which can then launch new tasks. The agent then receives tasks of
> the same framework and ignores them because it thinks the framework is
> shutting down. The framework is not shutting down of course, so from the
> master and the scheduler's perspective the task is pending in STAGING forever
> until the next agent reregistration, which could happen much later.
> This also makes the semantics of `ShutdownFrameworkMessage` ambiguous: the
> agent is assuming the framework to be going away (and act accordingly) when
> it's not.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)