> On Aug. 5, 2013, 8:57 p.m., Brenden Matthews wrote:
> > Is there not a better way to handle this process in an automated fashion?
> > It seems to require user intervention if the slave gets in to a bad state.
>
> Vinod Kone wrote:
> Thats a good point. The reason we wanted explicit intervention was to
> help us diagnose/fix issues with slave recovery easily. Once we deem slave
> recovery stable we could probably automate some of these decisions (maybe via
> a flag). Thoughts?
This sounds reasonable. In production there will be cases where it will fail
to recover, and the slave should take a reasonable course of action to return
to an operable state.
Perhaps --recovery_failure_action={continue,abort} ? Can we not also just use
--strict? As in, if --no-strict is set, it should continue as far as possible.
- Brenden
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13261/#review24673
-----------------------------------------------------------
On Aug. 3, 2013, 9:35 p.m., Vinod Kone wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13261/
> -----------------------------------------------------------
>
> (Updated Aug. 3, 2013, 9:35 p.m.)
>
>
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Brenden Matthews.
>
>
> Bugs: MESOS-613
> https://issues.apache.org/jira/browse/MESOS-613
>
>
> Repository: mesos-git
>
>
> Description
> -------
>
> See summary.
>
>
> Diffs
> -----
>
> src/slave/slave.cpp 7f6e6b456890db438092f19a22e4dd816bb33d04
>
> Diff: https://reviews.apache.org/r/13261/diff/
>
>
> Testing
> -------
>
> make check
>
>
> Thanks,
>
> Vinod Kone
>
>