> On Sept. 4, 2013, 4:30 a.m., Vinod Kone wrote:
> > src/slave/slave.cpp, lines 665-670
> > <https://reviews.apache.org/r/13954/diff/1/?file=347608#file347608line665>
> >
> >     +10 on having UUIDs on messages to track the runs. 
> >     
> >     That said this comment is more generic and applies to every message 
> > received by the slave. So, maybe this is not the right place to put it? How 
> > about putting this comment in initialize() where we install the handlers 
> > for messages?
> >     
> >     Also for reregistered() (and registered() too?) I propose we ignore the 
> > message when in RECOVERING unless info.id() != slaveId, in which case we 
> > crash. Basically the same as we do in DISCONNECTED and RUNNING states.

I've created the ticket as the way to track this proposal. Since I'm leaving 
this bug open as a potential crash in the slave under frequent restarts, I'd 
like this note here so that users can immediately understand what went wrong 
and notice MESOS-676 and MESOS-677 in the process.

Now that I've created MESOS-677 it's not only in our minds but also documented 
in a ticket, so I'll omit a comment in initialize().

In your comment, is it possible to be RECOVERING when we get registered()? If 
not, let's leave that as a crash!

For re-registered, I think it would be better to err on the side of caution as 
we could mis-interpret a re-registered message as a valid re-registration 
acknowledgement if we stay up and ignore them! This would have potentially 
occurred in MESOS-676 if the slave had not crashed and rather ignored some of 
the re-registration messages, given there were 11 bogus acknowledgments being 
sent from the master in the midst of recovery.

Should we revisit for 0.15.0? Perhaps we can think about a long term solution 
to this problem. =/


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13954/#review25876
-----------------------------------------------------------


On Sept. 4, 2013, 12:31 a.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13954/
> -----------------------------------------------------------
> 
> (Updated Sept. 4, 2013, 12:31 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Vinod Kone.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> These were missing declarations in the header file and since they are defined 
> at the bottom of slave.cpp they will not be used by all of the slave.cpp code.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp ce2b0dad1228363496875f233c14a602d9fb9dbe 
>   src/slave/slave.cpp 7f23b56c4db8b1828b3e0d02b7a4e7375cb76211 
> 
> Diff: https://reviews.apache.org/r/13954/diff/
> 
> 
> Testing
> -------
> 
> manual
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>

Reply via email to