Re: Framework disconnect kills running tasks

Zameer Manji Wed, 10 Feb 2016 18:57:37 -0800

Setting `failover_timeout` is key. The Apache Aurora framework defaults
this value to 21 days to ensure there is no accidental destruction of tasks
in a production environment. FWIW, I think the default is terrible and not
desirable. I really think frameworks should opt in to this behaviour than
opt out. A minor ZK or network blip can cause destruction of tasks by
default.


On Wed, Feb 10, 2016 at 5:05 PM, Shuai Lin <[email protected]> wrote:

> Hi suppandi,
>
> To make sure your tasks survive framework restarts, you need to:
>
> 1. When registering your framework,  set `failover_timeout` attribute of
> the FrameworkInfo PB. This is how long the master would wait for your
> framework to reconnect. By default it's 0, that's why your tasks are killed
> immediately when the framework exits.
>
> 2. When you reregister your framework, You need to use the same framework
> id as the previous run, so that the master can identify it's the framework
> reconnecting.
>
> Regards,
> Shuai
>
>
> On Thu, Feb 11, 2016 at 6:37 AM, suppandi <[email protected]> wrote:
>
> > Hi,
> >
> > I am trying to write my first framework and i wanted to test task
> > reconciliation. But whenever i kill my framework (with a kill -9), mesos
> > seems to cleanup the tasks by updating its state to TASK_KILLED.
> >
> > Is there a parameter when creating the framework or the task that makes
> > this happen? I want my task to remain alive when the framework is
> > disconnected/dead.
> >
> > Here is how i create my framework
> > https://gist.github.com/anonymous/3357783ce938c4293947
> >
> > and here is how i create my task
> > https://gist.github.com/anonymous/d35f917ade791127f4c5
> >
> > Thanks
> > suppandi
> >
>
> --
> Zameer Manji
>
>

Re: Framework disconnect kills running tasks

Reply via email to