Paul, what exactly do you want? Our current options are: 1) Shutdown the scheduler (and/or RM) while leaving tasks/executors running. Then you can restart the scheduler/RM on the same or another node, reregister with the same frameworkId, and reconnect to all those running tasks. This is the intended behavior of scheduler failover. 2) Shutdown and unregister the scheduler (and RM), which will kill all tasks/executors and prevent the scheduler from reregistering with the same frameworkId (at least until the master fails over). This can be done from the scheduler.stop() or from the master's administrative /shutdown endpoint. The intent is that you are done with this instance of the framework, and a new instance of the framework should register as a new frameworkId and launch all new tasks. Maybe you want to use scheduler.stop() and clear out your dependency on the old frameworkId so that it doesn't try to reregister as the same frameworkId on restart?
On Thu, Apr 9, 2015 at 9:12 AM, Paul Read <[email protected]> wrote: > from notes on the web: > > Stops the scheduler driver. If the failover flag is set to false then it is > expected that this framework will never reconnect to Mesos and all of its > executors and tasks can be terminated. Otherwise, all executors and tasks > will remain running (for some framework specific failover timeout) allowing > the scheduler to reconnect (possibly in the same process, or from a > different process, for example, on a different machine). > > > Seems neither is what we want. Looks like I will have to remain doing in > manually. > > > > On Thu, Apr 9, 2015 at 12:09 PM, Paul Read <[email protected]> wrote: > > > I don't think so. That seems to suggest a system may want to leave the > > executors running and reconnect. It also indicates this is the behavior > > when the slave dies. In our case the expected behavoir is for the slave > to > > stay running and the executor(s) and tasks to die. I would think > restarting > > a framework would be an acceptable behavior. > > > > On Thu, Apr 9, 2015 at 12:02 PM, Brandon Gulla <[email protected]> > > wrote: > > > >> > >> > https://mail-archives.apache.org/mod_mbox/mesos-user/201503.mbox/%[email protected]%3E > >> > >> > >> may be related > >> > >> > >> On Thu, Apr 9, 2015 at 12:00 PM, Paul Read <[email protected]> wrote: > >> > >> > > >> > In testing the shutdown service I decided to go from manually stopping > >> the > >> > tasks and executor to using the mesosDriver.stop() API which indicates > >> it > >> > will do just that, stop the tasks and executor. And it does, however > at > >> > that point forward you cannot restart the RM and have it communicate > >> with > >> > mesos. This seems odd to me. If I restart mesos I can then start a RM > >> and > >> > flex up/down tasks. > >> > > >> > The mesosDriver.abort() does not shutdown the tasks and/or executor. > >> > > >> > So is this a mesos bug or feature? > >> > > >> > Thanks. > >> > > >> > >> > >> > >> -- > >> Brandon > >> > > > > >
