> On Aug. 5, 2013, 8:57 p.m., Brenden Matthews wrote:
> > Is there not a better way to handle this process in an automated fashion?
> > It seems to require user intervention if the slave gets in to a bad state.
>
> Vinod Kone wrote:
> Thats a good point. The reason we wanted explicit intervention was to
> help us diagnose/fix issues with slave recovery easily. Once we deem slave
> recovery stable we could probably automate some of these decisions (maybe via
> a flag). Thoughts?
>
> Brenden Matthews wrote:
> This sounds reasonable. In production there will be cases where it will
> fail to recover, and the slave should take a reasonable course of action to
> return to an operable state.
>
> Perhaps --recovery_failure_action={continue,abort} ? Can we not also just
> use --strict? As in, if --no-strict is set, it should continue as far as
> possible.
>
> Vinod Kone wrote:
> I would like to wait on adding a flag/option for auto recovery after we
> get some data from testing. Let me know if its causing enough of a pain for
> you guys. And, hopefully you're not running it in production already?
Just running it in staging at the moment. We'd like to have it, but there
isn't an immediate concern.
- Brenden
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13261/#review24673
-----------------------------------------------------------
On Aug. 6, 2013, 5:54 p.m., Vinod Kone wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13261/
> -----------------------------------------------------------
>
> (Updated Aug. 6, 2013, 5:54 p.m.)
>
>
> Review request for mesos, Benjamin Hindman and Ben Mahler.
>
>
> Bugs: MESOS-613
> https://issues.apache.org/jira/browse/MESOS-613
>
>
> Repository: mesos-git
>
>
> Description
> -------
>
> See summary.
>
>
> Diffs
> -----
>
> src/slave/slave.cpp 9cd7754b647dde21267f1990edb7d4e1425beacd
>
> Diff: https://reviews.apache.org/r/13261/diff/
>
>
> Testing
> -------
>
> make check
>
>
> Thanks,
>
> Vinod Kone
>
>