> On April 9, 2014, 11:04 p.m., Vinod Kone wrote: > > src/master/main.cpp, lines 164-187 > > <https://reviews.apache.org/r/20097/diff/1/?file=552175#file552175line164> > > > > Is there a reason this is in main.cpp and not in master.cpp? > > > > The latter will make it easy to write tests for this in the future > > (when we have a Testable EXIT). > > Ben Mahler wrote: > Yeah, the reason was that this logic is something that would be inside > the 'Percentage' flag abstraction. > > I was a bit hesitant to have it in the Master since this logic would go > away entirely were we to add 'Percentage'. Let me know if you still want me > to move it! > > Vinod Kone wrote: > Most of the other flag validation checks (hostname, authenticate, > credentials, weights) are in master.cpp, so it would be nice to have this > validation alongside them? Does the move to master.cpp make it anymore > difficult to kill this code when we have Percentage abstraction?
Yeah that makes sense! Especially since this flag can come from different program entry points and I missed some of them! Did this cleanup for --registry_strict as well. > On April 9, 2014, 11:04 p.m., Vinod Kone wrote: > > src/master/flags.hpp, lines 81-90 > > <https://reviews.apache.org/r/20097/diff/1/?file=552174#file552174line81> > > > > This is pretty verbose for a flag description. How about making the > > example a comment rather than include it in the description? > > Ben Mahler wrote: > Ok, I removed the example in favor of more description tailoring to > operators. Let me know if you still think it's too verbose, I would like to > make sure that an operator can read this flag and understand its > implications. :) > > Vinod Kone wrote: > I still think it is verbose :). Your call. > > Personally I think these 2 sentences drive the point home > > "Limit on the percentage of slaves that can be removed from the > registry *and* shutdown after the re-registration timeout elapses. If the > limit is exceeded, the master will fail over rather than remove the slaves." > > More importantly, this re-registration timeout is only considered after a > failover but not when a slave disconnects and re-registers with the same > master correct? I think that needs to be mentioned. Good point, I'm not emphasizing the fact that this is only the removals after a failover. Re-structured to have the 2 sentences you mentioned together at the beginning as well. - Ben ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/20097/#review39956 ----------------------------------------------------------- On April 15, 2014, 1:28 a.m., Ben Mahler wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/20097/ > ----------------------------------------------------------- > > (Updated April 15, 2014, 1:28 a.m.) > > > Review request for mesos, Benjamin Hindman and Vinod Kone. > > > Bugs: MESOS-764 > https://issues.apache.org/jira/browse/MESOS-764 > > > Repository: mesos-git > > > Description > ------- > > After we recover, we remove any slaves that do not re-register within a > timeout. As a safety measure, this patch adds a configurable limit on the > percentage of slaves that can be removed in this manner. > > This provides safety guarantees to production operators to ensure that if > there are any unforeseen widespread failures, then the Master will continue > to failover as opposed to proceeding to remove a large percentage of slaves. > Operators can tune this for their environment. > > The current default is catered towards non-production environments, but I've > added a TODO to explore adding a '--production' flag that will allow us to > use different defaults (< 100% removal limit, no auto-initialization, etc). > > See the flag description for more details. > > We should add a 'Percentage' abstraction in the future per MESOS-1162. > > > Diffs > ----- > > src/master/constants.hpp 52d8d779e92f3be2b84d9237a9abbd2f580c0906 > src/master/constants.cpp 1cb8f22558b5bd3d90b24fcc6bf70cbe615a335c > src/master/flags.hpp 024f86d93824a20ce42c28b8264576f1cb715d0e > src/master/main.cpp f12f20a1eabd163c4f35056bf01f28f3edd408a9 > src/master/master.cpp 3c3c989543167afb7d368a19a16457ed00e6be0c > > Diff: https://reviews.apache.org/r/20097/diff/ > > > Testing > ------- > > make check > > It is currently not possible to exercise this case, given the use of EXIT. > > > Thanks, > > Ben Mahler > >
