> On March 4, 2014, 12:49 a.m., Vinod Kone wrote: > > src/master/registrar.cpp, lines 222-227 > > <https://reviews.apache.org/r/18675/diff/1/?file=507859#file507859line222> > > > > The semantics here are a bit hard to follow. > > > > How about instead of "recovering" variable you make 'recovered' > > Option<Promise<Registry> >? One side benefit is that when some one calls > > admit() etc with calling recover() first you could immediately return a > > failure if 'recovered.isNone()'. I think this explicitness is better than > > having those calls waiting forever? > > > > Also, why does it have to be MasterInfo? IIUC, all you want is someone > > to update the version of the registry. Can we just use a "bool" or "UUID" > > instead of MasterInfo? > > Ben Mahler wrote: > 1. Promises are not copyable or assignable, so you cannot use a Promise > inside an Option, unless you make it a pointer. > > 2. The calls wait forever only if a call to recover() does not occur. We > could, as you said, fail any operations that are initiated by introducing > another check in the admit/readmit/remove methods: > > Future<bool> RegistrarProcess::admit(const SlaveInfo& info) > { > if (!info.has_id()) { > return Failure("SlaveInfo is missing the 'id' field"); > } > > if (recovered.isPending() && !recovering) { > return Failure("Attempted to admit prior to recovering"); > } > > return recovered.future() > .then(defer(self(), &Self::_admit, info)); > } > > However, the old semantics were to provide "auto-recovery". We could > still provide auto-recovery if we had a factory function that allows the > Master to construct a Registrar by passing MasterInfo. > > 3. You are correct, we technically do not need MasterInfo, but flipping a > boolean seems hacky, and a UUID is a bit out of place. The UUID would change > with each leader election, but what also changes with each leader election? > The MasterInfo. This was why I went with MasterInfo. > > Let me know your thoughts on these points! > > Ben Mahler wrote: > We could use an Option<Owned<Promise>> to eliminate the need for a > 'recovering' boolean: > > if (!recovering && recovered.future.isPending()) // Before. > if (recovered.isNone()) // After. > > I'll look at cleaning this up. > > Vinod Kone wrote: > The argument for 3) seems a bit backwards. You actually had to complicate > the code (needing factoring functions and 2)) because you are using > MasterInfo. Why use it in the first place when you don't have to and go > through hoops? I don't see why a UUID is out of place? > > Looking at it in a different way, a registry is a place to store > persistent information that other masters can use to recover state. A new > master doesn't need to recovery an old master's MasterInfo. The MasterInfo is > just present to resolve conflicts and seems like an overkill. > > Ben Mahler wrote: > Thanks for bringing this up, I just realized that it is sufficient to > simply 'store' what we 'fetch' in order to cause a UUID change on the > underlying state entry, note the comment here: > > inline process::Future<Option<Variable> > State::store(const Variable& > variable) > { > // Note that we try and swap an entry even if the value didn't change! > UUID uuid = UUID::fromBytes(variable.entry.uuid()); > > // Create a new entry to replace the existing entry provided the > // UUID matches. > Entry entry; > entry.set_name(variable.entry.name()); > entry.set_uuid(UUID::random().toBytes()); > entry.set_value(variable.entry.value()); > > return storage->set(entry, uuid) > .then(lambda::bind(&State::_store, entry, lambda::_1)); > } > > Surprised we didn't notice this earlier! Perhaps we should make this more > clear in the state API.
I'd like us to caution assuming these _implementation_ semantics. What's the problem with storing the MasterInfo? Are we worried about performance? - Benjamin ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18675/#review36069 ----------------------------------------------------------- On March 6, 2014, 8:12 p.m., Ben Mahler wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/18675/ > ----------------------------------------------------------- > > (Updated March 6, 2014, 8:12 p.m.) > > > Review request for mesos, Benjamin Hindman and Vinod Kone. > > > Bugs: MESOS-764 > https://issues.apache.org/jira/browse/MESOS-764 > > > Repository: mesos-git > > > Description > ------- > > As a safety measure, we would like to force a version change on the Registry > as a result of the recovery process. This helps prevent a "rogue" Master > (that still believes it is leading) from performing writes to the Registry. > > This change adds a 'Recover' operation which adds the latest MasterInfo to > the Registry. Unfortunately, since the Registrar is injected to the Master, I > had to add 'MasterInfo' as an argument to the 'recover' function, thus > altering the design somewhat. > > The current semantics are that there one must call 'recover' before calling > other operations, otherwise these other operations will fail. > > > Diffs > ----- > > src/master/registrar.hpp 20734afc69055197e9ab90d42253c56e4af4b97c > src/master/registrar.cpp 37337c07b24a96e71910b7c83085d159361a1188 > src/tests/registrar_tests.cpp 3bf42bd77a10470a2afc6fd8e1da30d6134e792c > > Diff: https://reviews.apache.org/r/18675/diff/ > > > Testing > ------- > > Added a small test. > > > Thanks, > > Ben Mahler > >
