> On Aug. 28, 2014, 8:37 a.m., Adam B wrote: > > src/common/attributes.cpp, lines 161-164 > > <https://reviews.apache.org/r/25111/diff/4/?file=670589#file670589line161> > > > > In your code, this.isSuperset(that) returns false if any of this's > > attributes are not present in that. It therefore returns true only if all > > of this's attributes exist in that, meaning this is a __subset__ of that. > > According to the comment, isSuperset should return true "if this is a > > superset of that". Perhaps this is supposed to be the implementation of > > isSubset? > > > > Maybe we should rename the function to read "this.isSupersetOf(that)", > > to disambiguate from the possibility that "this.isSuperset(that)" might > > mean "for object this, is that a superset of it?"
My intention was that this is a superset of that, this.isSuperset(that) returns true iff this contains all the items in this which aren't in that. I've refactored those two functions around quite a bit and apparently forgot a logic flip (This is why unit tests are good). I definitely see that it could be read ambiguously. Renaming the functions (locally, patch up shortly), and swapping implementations. > On Aug. 28, 2014, 8:37 a.m., Adam B wrote: > > src/master/master.hpp, lines 911-912 > > <https://reviews.apache.org/r/25111/diff/4/?file=670590#file670590line911> > > > > I'd like to discuss the implications of this with you. I wonder if > > there's some way we can enforce this, perhaps by wrapping the SlaveInfo > > with a watcher/bodyguard that notifies the registrar if the SlaveInfo > > changes. Definitely there are ways to guard this. The best thing to do would be just having the registrar own + update the running state (Although I don't know the threading implications of that). There is a lot of context specific to states spread thoughout the code (And things like only if authenticated we can do x), which seem like they can be refactored to make it simpler / more consistent to enforce with less duplicated code. > On Aug. 28, 2014, 8:37 a.m., Adam B wrote: > > src/master/master.hpp, line 1213 > > <https://reviews.apache.org/r/25111/diff/4/?file=670590#file670590line1213> > > > > What do you mean by "push a slave"? Update its SlaveInfo? Yes, I mean update it's slave info. Updated the comment to hopefully be more clear (and fix a typo, better 70 column wrap, add more periods to comments in the function). > On Aug. 28, 2014, 8:37 a.m., Adam B wrote: > > src/master/master.cpp, line 2982 > > <https://reviews.apache.org/r/25111/diff/4/?file=670591#file670591line2982> > > > > Should probably handle the Future returned by registrar->apply(), in > > case it fails/discards, and/or if you want to wait for successful > > completion before performing any further actions. Updated. Also updated the logic so we won't call _reregisterSlave twice in the event that we have changed attribuets/resouces. > On Aug. 28, 2014, 8:37 a.m., Adam B wrote: > > src/master/master.cpp, line 3019 > > <https://reviews.apache.org/r/25111/diff/4/?file=670591#file670591line3019> > > > > Probably not often. The slave would have to send two > > ReregisterSlaveMessages in the time it takes for the registrar update to > > occur. This could happen if a) The slave restarted quickly; or b) the > > registrar update takes too long, perhaps due to networking or disk errors. The problem is that when this happens we don't update the slaveInfo. Since it may be because of the registrar taking too long, we have an issue where we'll just make life worse for the registrar if we keep piling on updates. If the slave restarts too quickly, we lose the attribute/resource change. Unfortunately the fast restart may be increasingly common in this case as an administrator sees they made a mistake and updates the attributes again. I'm not sure how to catch that at the moment. > On Aug. 28, 2014, 8:37 a.m., Adam B wrote: > > src/slave/slave.cpp, line 2732 > > <https://reviews.apache.org/r/25111/diff/4/?file=670593#file670593line2732> > > > > Who's going to call this? How about installing a protobuf handler for > > ReconfigureSlaveMessage or UpdateSlaveInfoMessage or something? > > If we have that just take an entire SlaveInfo, we'll have to check for > > hostname/port/SlaveID/checkpoint changes and disallow them for now, unless > > we do a full unregister/shutdown. The ping method still isn't well determined at the moment. Discussion ongoing at MESOS-1739. > On Aug. 28, 2014, 8:37 a.m., Adam B wrote: > > src/slave/slave.cpp, lines 2736-2738 > > <https://reviews.apache.org/r/25111/diff/4/?file=670593#file670593line2736> > > > > Once you get it working for the superset, I'd like to discuss how we > > can relax this restriction. It's perfectly reasonable for us to rescind > > existing offers, and losses of attributes/resources may be acceptable if no > > current tasks are using them. But it could get into some fun race > > conditions if frameworks are already trying to launch tasks on a slave > > while it is reconfiguring. Per comments on MESOS-1739, just updating to supersets can cause issues if people have negative checks for attributes on a node (Make sure the node isn't tagged with X). Definitely more discussion to be had here. > On Aug. 28, 2014, 8:37 a.m., Adam B wrote: > > src/slave/slave.cpp, line 2746 > > <https://reviews.apache.org/r/25111/diff/4/?file=670593#file670593line2746> > > > > How about reversing the logic to !(newResources >= resources) so it > > reads more like the error message? Only operator <= is defined at the moment for resources which is why I structued it as is. I can add an operator >= if you'd like. > On Aug. 28, 2014, 8:37 a.m., Adam B wrote: > > src/slave/slave.cpp, lines 2750-2751 > > <https://reviews.apache.org/r/25111/diff/4/?file=670593#file670593line2750> > > > > Changing the slave's internal state to DISCONNECTED does not actually > > "unregister" it in the master. The master will continue to try to launch > > tasks on the slave, offer its resources up to the frameworks, and any other > > business as usual. This is purely internal state to notify the slave that > > it thinks it still needs to register. > > That approach is fine as long as we're sticking to strict supersets. If > > we allow subsets, the slave will have to stop accepting new tasks and tell > > the master to rescind its outstanding offers. Only then can the slave > > update its local state and reregister to update the master/registry. That > > would be more of an "unregister" approach. What would be the proper way to do this then? (I'd rather not leave a ticking time bomb for the next iteration on this patchset where we start adding more capabilities). - Cody ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25111/#review51753 ----------------------------------------------------------- On Aug. 28, 2014, 4:42 p.m., Patrick Reilly wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/25111/ > ----------------------------------------------------------- > > (Updated Aug. 28, 2014, 4:42 p.m.) > > > Review request for mesos, Adam B and Benjamin Hindman. > > > Bugs: MESOS-1739 > https://issues.apache.org/jira/browse/MESOS-1739 > > > Repository: mesos-git > > > Description > ------- > > Add basic stub for dynamic slave attributes > > > Diffs > ----- > > src/common/attributes.hpp 0a043d5 > src/common/attributes.cpp aab114e > src/master/master.hpp c9f989a > src/master/master.cpp 2508b38 > src/slave/slave.hpp 9d4607e > src/slave/slave.cpp 5c76dd1 > > Diff: https://reviews.apache.org/r/25111/diff/ > > > Testing > ------- > > This is currently a work in progress, (WIP) > > > Thanks, > > Patrick Reilly > >
