> On April 9, 2014, 11:04 p.m., Vinod Kone wrote:
> > src/master/flags.hpp, lines 81-90
> > <https://reviews.apache.org/r/20097/diff/1/?file=552174#file552174line81>
> >
> >     This is pretty verbose for a flag description. How about making the 
> > example a comment rather than include it in the description?
> 
> Ben Mahler wrote:
>     Ok, I removed the example in favor of more description tailoring to 
> operators. Let me know if you still think it's too verbose, I would like to 
> make sure that an operator can read this flag and understand its 
> implications. :)

I still think it is verbose :). Your call.

Personally I think these 2 sentences drive the point home

"Limit on the percentage of slaves that can be removed from the registry *and* 
shutdown after the re-registration timeout elapses. If the limit is exceeded, 
the master will fail over rather than remove the slaves."

More importantly, this re-registration timeout is only considered after a 
failover but not when a slave disconnects and re-registers with the same master 
correct? I think that needs to be mentioned.


> On April 9, 2014, 11:04 p.m., Vinod Kone wrote:
> > src/master/main.cpp, lines 164-187
> > <https://reviews.apache.org/r/20097/diff/1/?file=552175#file552175line164>
> >
> >     Is there a reason this is in main.cpp and not in master.cpp?
> >     
> >     The latter will make it easy to write tests for this in the future 
> > (when we have a Testable EXIT).
> 
> Ben Mahler wrote:
>     Yeah, the reason was that this logic is something that would be inside 
> the 'Percentage' flag abstraction.
>     
>     I was a bit hesitant to have it in the Master since this logic would go 
> away entirely were we to add 'Percentage'. Let me know if you still want me 
> to move it!

Most of the other flag validation checks (hostname, authenticate, credentials, 
weights) are in master.cpp, so it would be nice to have this validation 
alongside them? Does the move to master.cpp make it anymore difficult to kill 
this code when we have Percentage abstraction?


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20097/#review39956
-----------------------------------------------------------


On April 15, 2014, 1:28 a.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20097/
> -----------------------------------------------------------
> 
> (Updated April 15, 2014, 1:28 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Vinod Kone.
> 
> 
> Bugs: MESOS-764
>     https://issues.apache.org/jira/browse/MESOS-764
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> After we recover, we remove any slaves that do not re-register within a 
> timeout. As a safety measure, this patch adds a configurable limit on the 
> percentage of slaves that can be removed in this manner.
> 
> This provides safety guarantees to production operators to ensure that if 
> there are any unforeseen widespread failures, then the Master will continue 
> to failover as opposed to proceeding to remove a large percentage of slaves. 
> Operators can tune this for their environment.
> 
> The current default is catered towards non-production environments, but I've 
> added a TODO to explore adding a '--production' flag that will allow us to 
> use different defaults (< 100% removal limit, no auto-initialization, etc).
> 
> See the flag description for more details.
> 
> We should add a 'Percentage' abstraction in the future per MESOS-1162.
> 
> 
> Diffs
> -----
> 
>   src/master/constants.hpp 52d8d779e92f3be2b84d9237a9abbd2f580c0906 
>   src/master/constants.cpp 1cb8f22558b5bd3d90b24fcc6bf70cbe615a335c 
>   src/master/flags.hpp 024f86d93824a20ce42c28b8264576f1cb715d0e 
>   src/master/main.cpp f12f20a1eabd163c4f35056bf01f28f3edd408a9 
>   src/master/master.cpp 3c3c989543167afb7d368a19a16457ed00e6be0c 
> 
> Diff: https://reviews.apache.org/r/20097/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> It is currently not possible to exercise this case, given the use of EXIT.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>

Reply via email to