> On Sept. 4, 2013, 4:30 a.m., Vinod Kone wrote: > > src/slave/slave.cpp, lines 665-670 > > <https://reviews.apache.org/r/13954/diff/1/?file=347608#file347608line665> > > > > +10 on having UUIDs on messages to track the runs. > > > > That said this comment is more generic and applies to every message > > received by the slave. So, maybe this is not the right place to put it? How > > about putting this comment in initialize() where we install the handlers > > for messages? > > > > Also for reregistered() (and registered() too?) I propose we ignore the > > message when in RECOVERING unless info.id() != slaveId, in which case we > > crash. Basically the same as we do in DISCONNECTED and RUNNING states.
I've created the ticket as the way to track this proposal. Since I'm leaving this bug open as a potential crash in the slave under frequent restarts, I'd like this note here so that users can immediately understand what went wrong and notice MESOS-676 and MESOS-677 in the process. Now that I've created MESOS-677 it's not only in our minds but also documented in a ticket, so I'll omit a comment in initialize(). In your comment, is it possible to be RECOVERING when we get registered()? If not, let's leave that as a crash! For re-registered, I think it would be better to err on the side of caution as we could mis-interpret a re-registered message as a valid re-registration acknowledgement if we stay up and ignore them! This would have potentially occurred in MESOS-676 if the slave had not crashed and rather ignored some of the re-registration messages, given there were 11 bogus acknowledgments being sent from the master in the midst of recovery. Should we revisit for 0.15.0? Perhaps we can think about a long term solution to this problem. =/ - Ben ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/13954/#review25876 ----------------------------------------------------------- On Sept. 4, 2013, 12:31 a.m., Ben Mahler wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/13954/ > ----------------------------------------------------------- > > (Updated Sept. 4, 2013, 12:31 a.m.) > > > Review request for mesos, Benjamin Hindman and Vinod Kone. > > > Repository: mesos-git > > > Description > ------- > > These were missing declarations in the header file and since they are defined > at the bottom of slave.cpp they will not be used by all of the slave.cpp code. > > > Diffs > ----- > > src/slave/slave.hpp ce2b0dad1228363496875f233c14a602d9fb9dbe > src/slave/slave.cpp 7f23b56c4db8b1828b3e0d02b7a4e7375cb76211 > > Diff: https://reviews.apache.org/r/13954/diff/ > > > Testing > ------- > > manual > > > Thanks, > > Ben Mahler > >
