[ 
https://issues.apache.org/jira/browse/MESOS-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088947#comment-14088947
 ] 

Adam B commented on MESOS-1630:
-------------------------------

I can interpret a couple of problem scenarios from this JIRA's description:

1) Imagine if a slave was running tasks while disconnected during the period 
when the framework was unregistered and then re-registered. In that case, all 
the other tasks for that framework on other connected slaves have been 
shutdown, and one would think that we would want all the old tasks to shut 
down, since the newly re-registered framework wouldn't necessarily know about 
them (and might even have new FrameworkInfo values).

2) This ticket may be describing a problem where new tasks launched by the 
re-registered framework are being shutdown when a slave starts running a new 
task and then re-registers/reconciles, since the frameworkId was added to 
frameworks.registered but never removed from frameworks.completed. In this 
case, I agree that the framework should not be considered "completed" and the 
the new task needs to keep running.

But even if we're just trying to solve scenario 2 by calling 
frameworks.completed.erase in Master::addFramework or _reregisterFramework, we 
still have to consider how to handle scenario 1.

Implementation Detail:
- When we erase the re-registering Framework from frameworks.completed and add 
it to frameworks.registered, should we copy over completedExecutors, 
completedTasks, or the original registeredTime/unregisteredTime? The 
FrameworkInfo should probably come from the ReregisterMessage (not the old 
completed framework) though.
- Or maybe we start with the old Framework struct and just update the 
FrameworkInfo, pid, and reregisteredTime?

> Remove framework from completedFrameworks if framework re-registers.
> --------------------------------------------------------------------
>
>                 Key: MESOS-1630
>                 URL: https://issues.apache.org/jira/browse/MESOS-1630
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.14.0, 0.14.1, 0.14.2, 0.17.0, 0.16.0, 0.15.0, 0.18.0, 
> 0.18.1, 0.18.2, 0.19.0, 0.19.1
>            Reporter: Benjamin Hindman
>            Assignee: Bernd Mathiske
>            Priority: Critical
>
> If a framework gets removed, for example, because it unregisters with the 
> master (i.e., due to MESOS-1550), but then the same framework ID is reused 
> when a framework re-registers (which we currently allow) then we should 
> remove the framework from Master::frameworks.completed otherwise when a slave 
> re-registers then in Master::reconcile we'll notice that the slave is running 
> tasks from a "completed" framework and tell the slave to shutdown that 
> framework, thus shutting down all of the tasks.
> This should be easily fixed by removing the framework from 
> frameworks.completed when a framework re-registers with the same ID as a 
> completed framework. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to