----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31515/#review74478 -----------------------------------------------------------
src/master/master.cpp <https://reviews.apache.org/r/31515/#comment121057> Nothing to do with the patch, but I think this line is not very readable. Why would I cancel a removal if the slave is not the `slaves.recovered`. Had to check what `slaves.recovered` actually is to understand. Perhaps a comment here would be good. - Alexander Rojas On Feb. 27, 2015, 3:58 a.m., Ben Mahler wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/31515/ > ----------------------------------------------------------- > > (Updated Feb. 27, 2015, 3:58 a.m.) > > > Review request for mesos and Vinod Kone. > > > Bugs: MESOS-2392 > https://issues.apache.org/jira/browse/MESOS-2392 > > > Repository: mesos > > > Description > ------- > > Much like we rate limit slave removals in the common path (MESOS-1148), we > need to rate limit slave removals that occur during master recovery. When a > master recovers and is using a strict registry, slaves that do not > re-register within a timeout will be removed. > > Currently there is a safeguard in place to abort when too many slaves have > not re-registered. However, in the case of a transient partition, we don't > want to remove large sections of slaves without rate limiting. > > > Diffs > ----- > > src/master/master.hpp 8c44d6ed57ad1b94a17bef8142a5e6a15889a810 > src/master/master.cpp 76e217d16c03e587ea4c0afca94c58b2212f0f93 > > Diff: https://reviews.apache.org/r/31515/diff/ > > > Testing > ------- > > make check > > Added tests in subsequent review. > > > Thanks, > > Ben Mahler > >
