Re: Review Request 13261: Clarified the guidance for users when slave recovery fails.

Brenden Matthews Mon, 05 Aug 2013 15:58:25 -0700


> On Aug. 5, 2013, 8:57 p.m., Brenden Matthews wrote:
> > Is there not a better way to handle this process in an automated fashion?  
> > It seems to require user intervention if the slave gets in to a bad state.
> 
> Vinod Kone wrote:
>     Thats a good point. The reason we wanted explicit intervention was to 
> help us diagnose/fix issues with slave recovery easily. Once we deem slave 
> recovery stable we could probably automate some of these decisions (maybe via 
> a flag). Thoughts?


This sounds reasonable.  In production there will be cases where it will fail 
to recover, and the slave should take a reasonable course of action to return 
to an operable state.

Perhaps --recovery_failure_action={continue,abort} ? Can we not also just use 
--strict?  As in, if --no-strict is set, it should continue as far as possible.


- Brenden


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13261/#review24673
-----------------------------------------------------------


On Aug. 3, 2013, 9:35 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13261/
> -----------------------------------------------------------
> 
> (Updated Aug. 3, 2013, 9:35 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Brenden Matthews.
> 
> 
> Bugs: MESOS-613
>     https://issues.apache.org/jira/browse/MESOS-613
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.cpp 7f6e6b456890db438092f19a22e4dd816bb33d04 
> 
> Diff: https://reviews.apache.org/r/13261/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>

Re: Review Request 13261: Clarified the guidance for users when slave recovery fails.

Reply via email to