> On April 9, 2014, 11:04 p.m., Vinod Kone wrote:
> > src/master/main.cpp, lines 164-187
> > <https://reviews.apache.org/r/20097/diff/1/?file=552175#file552175line164>
> >
> >     Is there a reason this is in main.cpp and not in master.cpp?
> >     
> >     The latter will make it easy to write tests for this in the future 
> > (when we have a Testable EXIT).
> 
> Ben Mahler wrote:
>     Yeah, the reason was that this logic is something that would be inside 
> the 'Percentage' flag abstraction.
>     
>     I was a bit hesitant to have it in the Master since this logic would go 
> away entirely were we to add 'Percentage'. Let me know if you still want me 
> to move it!
> 
> Vinod Kone wrote:
>     Most of the other flag validation checks (hostname, authenticate, 
> credentials, weights) are in master.cpp, so it would be nice to have this 
> validation alongside them? Does the move to master.cpp make it anymore 
> difficult to kill this code when we have Percentage abstraction?

Yeah that makes sense! Especially since this flag can come from different 
program entry points and I missed some of them! Did this cleanup for 
--registry_strict as well.


> On April 9, 2014, 11:04 p.m., Vinod Kone wrote:
> > src/master/flags.hpp, lines 81-90
> > <https://reviews.apache.org/r/20097/diff/1/?file=552174#file552174line81>
> >
> >     This is pretty verbose for a flag description. How about making the 
> > example a comment rather than include it in the description?
> 
> Ben Mahler wrote:
>     Ok, I removed the example in favor of more description tailoring to 
> operators. Let me know if you still think it's too verbose, I would like to 
> make sure that an operator can read this flag and understand its 
> implications. :)
> 
> Vinod Kone wrote:
>     I still think it is verbose :). Your call.
>     
>     Personally I think these 2 sentences drive the point home
>     
>     "Limit on the percentage of slaves that can be removed from the   
> registry *and* shutdown after the re-registration timeout elapses. If the 
> limit is exceeded, the master will fail over rather than remove the slaves."
>     
>     More importantly, this re-registration timeout is only considered after a 
> failover but not when a slave disconnects and re-registers with the same 
> master correct? I think that needs to be mentioned.

Good point, I'm not emphasizing the fact that this is only the removals after a 
failover. Re-structured to have the 2 sentences you mentioned together at the 
beginning as well.


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20097/#review39956
-----------------------------------------------------------


On April 15, 2014, 1:28 a.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20097/
> -----------------------------------------------------------
> 
> (Updated April 15, 2014, 1:28 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Vinod Kone.
> 
> 
> Bugs: MESOS-764
>     https://issues.apache.org/jira/browse/MESOS-764
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> After we recover, we remove any slaves that do not re-register within a 
> timeout. As a safety measure, this patch adds a configurable limit on the 
> percentage of slaves that can be removed in this manner.
> 
> This provides safety guarantees to production operators to ensure that if 
> there are any unforeseen widespread failures, then the Master will continue 
> to failover as opposed to proceeding to remove a large percentage of slaves. 
> Operators can tune this for their environment.
> 
> The current default is catered towards non-production environments, but I've 
> added a TODO to explore adding a '--production' flag that will allow us to 
> use different defaults (< 100% removal limit, no auto-initialization, etc).
> 
> See the flag description for more details.
> 
> We should add a 'Percentage' abstraction in the future per MESOS-1162.
> 
> 
> Diffs
> -----
> 
>   src/master/constants.hpp 52d8d779e92f3be2b84d9237a9abbd2f580c0906 
>   src/master/constants.cpp 1cb8f22558b5bd3d90b24fcc6bf70cbe615a335c 
>   src/master/flags.hpp 024f86d93824a20ce42c28b8264576f1cb715d0e 
>   src/master/main.cpp f12f20a1eabd163c4f35056bf01f28f3edd408a9 
>   src/master/master.cpp 3c3c989543167afb7d368a19a16457ed00e6be0c 
> 
> Diff: https://reviews.apache.org/r/20097/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> It is currently not possible to exercise this case, given the use of EXIT.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>

Reply via email to