> On Aug. 5, 2013, 8:57 p.m., Brenden Matthews wrote:
> > Is there not a better way to handle this process in an automated fashion?  
> > It seems to require user intervention if the slave gets in to a bad state.
> 
> Vinod Kone wrote:
>     Thats a good point. The reason we wanted explicit intervention was to 
> help us diagnose/fix issues with slave recovery easily. Once we deem slave 
> recovery stable we could probably automate some of these decisions (maybe via 
> a flag). Thoughts?
> 
> Brenden Matthews wrote:
>     This sounds reasonable.  In production there will be cases where it will 
> fail to recover, and the slave should take a reasonable course of action to 
> return to an operable state.
>     
>     Perhaps --recovery_failure_action={continue,abort} ? Can we not also just 
> use --strict?  As in, if --no-strict is set, it should continue as far as 
> possible.
> 
> Vinod Kone wrote:
>     I would like to wait on adding a flag/option for auto recovery after we 
> get some data from testing. Let me know if its causing enough of a pain for 
> you guys. And, hopefully you're not running it in production already?

Just running it in staging at the moment.  We'd like to have it, but there 
isn't an immediate concern.


- Brenden


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13261/#review24673
-----------------------------------------------------------


On Aug. 6, 2013, 5:54 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13261/
> -----------------------------------------------------------
> 
> (Updated Aug. 6, 2013, 5:54 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Bugs: MESOS-613
>     https://issues.apache.org/jira/browse/MESOS-613
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.cpp 9cd7754b647dde21267f1990edb7d4e1425beacd 
> 
> Diff: https://reviews.apache.org/r/13261/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>

Reply via email to