>
> So one thing that was brought up during offline conversations was that if
> the host reboot is associated with hardware change (e.g., a new memory
> stick):
>- With the change: the agent could run into incompatible agent info
>due to resource change and flap
>
>
> we can have the agent remove `rm -f /meta/slaves/latest`
automatically upon recovery failure but only after the host has rebooted.
This sounds dangerous. When the different of AgentInfo is caused by
operator's typo, I think the operator would prefer to correct them and try
to start agent again.
agree with james's options.
2016-11-30 0:48 GMT+08:00 James Peach :
>
> > On Nov 28, 2016, at 6:09 PM, Yan Xu wrote:
> >
> > So one thing that was brought up during offline conversations was that
> if the host reboot is associated with hardware change (e.g.,
> On Nov 28, 2016, at 6:09 PM, Yan Xu wrote:
>
> So one thing that was brought up during offline conversations was that if the
> host reboot is associated with hardware change (e.g., a new memory stick):
>
> • Currently: the agent would skip the recovery (and the
So one thing that was brought up during offline conversations was that if
the host reboot is associated with hardware change (e.g., a new memory
stick):
- Currently: the agent would skip the recovery (and the chance of
running into incompatible agent info) and register as a new agent.
-
Hi All,
We have been working on the design for Restartable tasks ( MESOS-3545) and
allowing agents to recover and re-register post reboot is a pre-requisite for
that.
Agent today doesn’t recover its state that includes its SlaveID post a host
reboot, it short-circuits the recovery upon