I recently posted a similar question to the user list to better understand
how slave recovery works. You can read the thread at
http://mail-archives.apache.org/mod_mbox/mesos-user/201506.mbox/browser

Quoting Vinod from that thread:

> 'recovery_timeout' was added to make sure that if a slave
> is down for a long time (>10 mins), the executors commit suicide. It is
> better for the executor/task to die than keep running because the
framework
> might have already launched another replica of that instance. This was not
> tied to the 75s timeout (hard coded) because it is possible for a slave to
> successfully re-register with a master after 75s (e.g., both master and
> slave are down for 5 min).

Adam also replied with a ticket that will allow the 75s ping timeout to be
configurable in future releases (appears to be 0.23.0 and onward):
https://issues.apache.org/jira/browse/MESOS-2110

As for shutting down the mesos-slave daemon, I (personally) don't think
that it's really a problem. There are various tools (Puppet, Monit, etc)
that allow you to define a service's desired state.

-- Roger

On Tue, Jun 30, 2015 at 3:27 AM, An an Zhao <[email protected]> wrote:

> Hi,
>     For now, master would kill the slave when re-registering timeout
> according to the document.
>
> > If the slave takes longer than this timeout to re-register, the master
> shuts down the slave, which in turn shuts down any live executors/tasks.
>
> * 1. * I think it's more friendly and directly that the slave only kill the
> executors without exiting, after that the slave start register.
>      On the other hand, It would take some effort to support this, maybe
> it's not worth.
>       What's your opinion?
>
> *2. *The slave has a flag   recovery_timeout  which is 15min  by default.
> Also the slave will fail to re-register and kill the executors when it
> takes longer than the health check timeout ( which is 75s).   So the
> executors are useless after 75s.
>    * I'm wondering why the recovery_timeout is 15min by default. I think
>  that 75s is enough.*  Is this a good idea?
>
>
>    Thanks for your time.
>
> Best regards.
>

Reply via email to