I recently posted a similar question to the user list to better understand how slave recovery works. You can read the thread at http://mail-archives.apache.org/mod_mbox/mesos-user/201506.mbox/browser
Quoting Vinod from that thread: > 'recovery_timeout' was added to make sure that if a slave > is down for a long time (>10 mins), the executors commit suicide. It is > better for the executor/task to die than keep running because the framework > might have already launched another replica of that instance. This was not > tied to the 75s timeout (hard coded) because it is possible for a slave to > successfully re-register with a master after 75s (e.g., both master and > slave are down for 5 min). Adam also replied with a ticket that will allow the 75s ping timeout to be configurable in future releases (appears to be 0.23.0 and onward): https://issues.apache.org/jira/browse/MESOS-2110 As for shutting down the mesos-slave daemon, I (personally) don't think that it's really a problem. There are various tools (Puppet, Monit, etc) that allow you to define a service's desired state. -- Roger On Tue, Jun 30, 2015 at 3:27 AM, An an Zhao <[email protected]> wrote: > Hi, > For now, master would kill the slave when re-registering timeout > according to the document. > > > If the slave takes longer than this timeout to re-register, the master > shuts down the slave, which in turn shuts down any live executors/tasks. > > * 1. * I think it's more friendly and directly that the slave only kill the > executors without exiting, after that the slave start register. > On the other hand, It would take some effort to support this, maybe > it's not worth. > What's your opinion? > > *2. *The slave has a flag recovery_timeout which is 15min by default. > Also the slave will fail to re-register and kill the executors when it > takes longer than the health check timeout ( which is 75s). So the > executors are useless after 75s. > * I'm wondering why the recovery_timeout is 15min by default. I think > that 75s is enough.* Is this a good idea? > > > Thanks for your time. > > Best regards. >
