> On March 20, 2014, 9:21 p.m., Vinod Kone wrote: > > src/master/master.hpp, line 421 > > <https://reviews.apache.org/r/19383/diff/1/?file=526626#file526626line421> > > > > Why cache instead of a circular buffer? > > Ben Mahler wrote: > What's the issue here? > > Circular buffer does not support lookup. In the past we've used circular > buffer because we've wanted two properties: > > 1. Set semantics. > 2. LRI semantics. > > In this case, we want LRU semantics and cache<X, Nothing> provides the > desired set semantics.
I see. Maybe we need a cache like thing that holds just keys instead of keys and values. Using a map for a set seems a bit weird. Maybe a TODO? > On March 20, 2014, 9:21 p.m., Vinod Kone wrote: > > src/master/master.cpp, lines 609-610 > > <https://reviews.apache.org/r/19383/diff/1/?file=526627#file526627line609> > > > > But they could still fire if they are already dispatched and are in the > > queue right? Would that cause problems? > > Ben Mahler wrote: > It's a race of course, but it's likely that the dispatches would be > dropped before the new Master started. Also, the timing required for these to > fire but be in the queue when we're destructing is pretty precise, do you > have any suggestions here? I suggest to update the comment to reflect what you just said for one :) I'm concerned because I had a similar case when implementing authentication (see a few lines above where i had to do a discard) which caused unexpected issues in subsequent tests that were hard to debug. How about passing something to __recoverTimeout() that is unique to each invocation. MasterInfo perhaps? > On March 20, 2014, 9:21 p.m., Vinod Kone wrote: > > src/master/master.cpp, line 1946 > > <https://reviews.apache.org/r/19383/diff/1/?file=526627#file526627line1946> > > > > why the change? what does activated mean? > > Ben Mahler wrote: > I changed this because "slave does not exist" is pretty generic and does > not capture those slaves that are registering, re-registering, being removed, > etc. These slaves "exist" despite what the log message says, so I'm trying to > be a bit more accurate in our wording. Suggestions? I see. I was thrown off because of the getSlave() helper. "activated" seems fine. > On March 20, 2014, 9:21 p.m., Vinod Kone wrote: > > src/master/master.cpp, line 2072 > > <https://reviews.apache.org/r/19383/diff/1/?file=526627#file526627line2072> > > > > s/reply/send(from,/ ? > > Ben Mahler wrote: > Why change this, the method is using 'reply' elsewhere. I was confused because you changed reply() to send() in registerSlave(). Was that intentional? If not, please revert that. > On March 20, 2014, 9:21 p.m., Vinod Kone wrote: > > src/master/master.cpp, lines 3267-3270 > > <https://reviews.apache.org/r/19383/diff/1/?file=526627#file526627line3267> > > > > Why do it this way? Why not do the bulk of this method in > > _removeSlave() ? > > Ben Mahler wrote: > Have you thought through the implications of that approach? > > If we keep the slave around in-memory when we decided to remove it, we > need to treat this slave in a special manner to keep things consistent for > frameworks. > > We're likely to end up with bugs / consistency problems if we do this. > Say we decide to remove the slave but we keep it in-memory. If a status > update arrives from this slave before the write completes, we better drop the > update or else we're potentially going to lie about a TASK_LOST in > _removeSlave. > > The only way the Registrar can deny a removal operation is if the slave > is already removed, so removing it from the in-memory state first is safe. We > also don't have to think about all the cases where a slave might send a > message between removeSlave and _removeSlave. I am not convinced this is the right approach. The way I see using a commit log is to persist the intent first and then changing the internal state. For example, the way you did adding a slave seems natural to me. You stored the intent locally (added to activated) and then persisted it in the registrar. After registrar persisted you created the slave structure. It would be nice if the logic is symmetric for removing a slave. I recommend just adding the slave to 'removing', removing it from 'activated' and passing the Slave* to the continuation. - Vinod ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/19383/#review37918 ----------------------------------------------------------- On March 21, 2014, 12:56 a.m., Ben Mahler wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/19383/ > ----------------------------------------------------------- > > (Updated March 21, 2014, 12:56 a.m.) > > > Review request for mesos, Benjamin Hindman and Vinod Kone. > > > Bugs: MESOS-764 > https://issues.apache.org/jira/browse/MESOS-764 > > > Repository: mesos-git > > > Description > ------- > > This implements the Registry-backed Master, with the following exceptions > that will be addressed in follow up changes: > > -Note that the --registry_strict flag is enforced to be false in > master/main.cpp. > -Reconciliation remains unimplemented as before. > -Improvements can be made to killTask, specifically we should add SlaveID to > the message in order to drop fewer requests for unknown slaves. > -Orthogonally, this does not address MESOS-682. > > I've updated 'deactivated' slaves to be a cache of SlaveIDs rather than UPIDs > as this was the intent originally (we were concerned about the unbounded > growth of the set, but cache<SlaveID, Nothing> keeps a fixed capacity). > > > Diffs > ----- > > src/master/constants.hpp cdaaad060d4ee777f8b0838b63c0fd031da861ea > src/master/constants.cpp 18548834468243bef8ae090f70363e2b9f571ac5 > src/master/master.hpp a37a2a2fa3713b8251eb318dfb45e0793cb344ff > src/master/master.cpp 3b28f72be2016997e30649d2b12ed0e8f1a57dd5 > src/messages/messages.proto c26a3d0e69bbbd447c859cf175c139ab8948fde2 > src/slave/slave.cpp d8d3e0fa54972201d72b2650ec0ba922a4912d54 > > Diff: https://reviews.apache.org/r/19383/diff/ > > > Testing > ------- > > This change preserves the previous semantics and so all existing tests pass. > > This is because the Registrar can only operate in a "non-strict" manner. > > An unfortunate effect of this change is that many tests run slower due to the > fact that messages are dropped while we're recovering, an alternative > approach here would be to re-enqueue *all* incoming messages through > recover(). However, this adds queuing delay to each message processed in the > Master and the performance implications of this are not well understood for > large clusters. > > > Thanks, > > Ben Mahler > >