[ 
https://issues.apache.org/jira/browse/MESOS-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Park updated MESOS-7215:
--------------------------------
    Target Version/s: 1.2.2, 1.4.0, 1.3.2  (was: 1.2.2, 1.4.0)

> Race condition on re-registration of non-partition-aware frameworks
> -------------------------------------------------------------------
>
>                 Key: MESOS-7215
>                 URL: https://issues.apache.org/jira/browse/MESOS-7215
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 1.2.0
>            Reporter: Yan Xu
>            Assignee: Megha Sharma
>            Priority: Critical
>
> Prior to the partition-awareness work MESOS-5344, upon agent reregistration 
> after it has been removed, the master only sends ShutdownFrameworkMessages to 
> the agent for frameworks that it knows have been torn down. 
> With the new logic in MESOS-5344, Mesos is now sending 
> {{ShutdownFrameworkMessages}} to the agent for all non-partition-aware 
> frameworks (including the ones that are still registered)
> This is problematic. The offer from this agent can still go to the same 
> framework which can then launch new tasks. The agent then receives tasks of 
> the same framework and ignores them because it thinks the framework is 
> shutting down. The framework is not shutting down of course, so from the 
> master and the scheduler's perspective the task is pending in STAGING forever 
> until the next agent reregistration, which could happen much later.
> This also makes the semantics of `ShutdownFrameworkMessage` ambiguous: the 
> agent is assuming the framework to be going away (and act accordingly) when 
> it's not. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to