Re: MESOS-6233 Allow agents to re-register post a host reboot

2016-12-12 Thread Joris Van Remoortere
> > So one thing that was brought up during offline conversations was that if > the host reboot is associated with hardware change (e.g., a new memory > stick): >- With the change: the agent could run into incompatible agent info >due to resource change and flap > >

Re: MESOS-6233 Allow agents to re-register post a host reboot

2016-12-04 Thread haosdent
> we can have the agent remove `rm -f /meta/slaves/latest` automatically upon recovery failure but only after the host has rebooted. This sounds dangerous. When the different of AgentInfo is caused by operator's typo, I think the operator would prefer to correct them and try to start agent again.

Re: MESOS-6233 Allow agents to re-register post a host reboot

2016-11-29 Thread tommy xiao
agree with james's options. 2016-11-30 0:48 GMT+08:00 James Peach : > > > On Nov 28, 2016, at 6:09 PM, Yan Xu wrote: > > > > So one thing that was brought up during offline conversations was that > if the host reboot is associated with hardware change (e.g.,

Re: MESOS-6233 Allow agents to re-register post a host reboot

2016-11-29 Thread James Peach
> On Nov 28, 2016, at 6:09 PM, Yan Xu wrote: > > So one thing that was brought up during offline conversations was that if the > host reboot is associated with hardware change (e.g., a new memory stick): > > • Currently: the agent would skip the recovery (and the

Re: MESOS-6233 Allow agents to re-register post a host reboot

2016-11-28 Thread Yan Xu
So one thing that was brought up during offline conversations was that if the host reboot is associated with hardware change (e.g., a new memory stick): - Currently: the agent would skip the recovery (and the chance of running into incompatible agent info) and register as a new agent. -

MESOS-6233 Allow agents to re-register post a host reboot

2016-11-15 Thread Megha Sharma
Hi All, We have been working on the design for Restartable tasks ( MESOS-3545) and allowing agents to recover and re-register post reboot is a pre-requisite for that. Agent today doesn’t recover its state that includes its SlaveID post a host reboot, it short-circuits the recovery upon