agree with james's options. 2016-11-30 0:48 GMT+08:00 James Peach <[email protected]>:
> > > On Nov 28, 2016, at 6:09 PM, Yan Xu <[email protected]> wrote: > > > > So one thing that was brought up during offline conversations was that > if the host reboot is associated with hardware change (e.g., a new memory > stick): > > > > • Currently: the agent would skip the recovery (and the chance of > running into incompatible agent info) and register as a new agent. > > • With the change: the agent could run into incompatible agent > info due to resource change and flap indefinitely until the operator > intervenes. > > > > To mitigate this and maintain the current behavior, we can have the > agent remove `rm -f <work_dir>/meta/slaves/latest` automatically upon > recovery failure but only after the host has rebooted. This way the agent > can restart as a new agent without operator intervention. > > > > Any thoughts? > > I still think you need a mechanism for the master/agent to tell you > whether it will honor the restart policy. Without this, you have to lock > the framework to a Mesos version. > > An empty RestartPolicy is also problematic since it precludes using > RestartPolicy in pods. If you later want to restart a task inside a pod but > not across agent restarts you would have no way to express that. > > J -- Deshi Xiao Twitter: xds2000 E-mail: xiaods(AT)gmail.com
