> On July 17, 2013, 12:15 a.m., Bill Farner wrote: > > src/sched/sched.cpp, line 390 > > <https://reviews.apache.org/r/12603/diff/1/?file=322203#file322203line390> > > > > I'm ignorant to the implications of this, but can you confirm/deny the > > following behavior? > > > > - Queue holds [U1, U2, U3, U4] which have yet to be processed. > > > > - Update U1 arrives, this code processes it. > > > > - Scheduler aborts. > > > > - New scheduler receives retried [U1, U2, U2, U4] (in any order) > > Vinod Kone wrote: > Not sure which queue you are referring to, but I'm assuming you mean the > 'uuids' set? > > An update goes into 'uuids' only after it is processed (i.e., > Scheduler::statusUpdate() returns) by the scheduler. > > In the above scenario if a duplicate U1 is enqueued in the libprocess > queue and the scheduler aborts after handling the original U1, the driver > would've aborted and we would have never come here. > > When a new scheduler (and driver) becomes the leader they get updates > fresh from mesos. > > Does that make sense? > > Bill Farner wrote: > I think you explained behavior for a slightly different scenario than > what i'm attempting to describe. > > - The driver has received [U1, U2, U3, U4], but the scheduler > implementation has yet to receive/ACK them. > > - A duplicate U1 arrives. > > - Scheduler aborts. > > What happens in that scenario? Based on the verbiage in the diff, it > sounds as though U1 is ACKed to other parts of the system, and will not be > retried when the new scheduler takes over.
It is not possible for U1,U2,U3 and U4 to have been processed by the driver while the scheduler has not yet processed U1. The scheduler should've processed U1 before U2 can be processed by the driver, since it is a synchronous call into the scheduler. Subsequently, if a duplicate U1 (i.e., U1 is in 'uuids') is being processed by the driver, it means the driver has not aborted when it dealt with the original U1. Because if the driver aborted while handling the original U1, 'aborted' flag would've been set before the driver processes any other updates. Makes sense? I now realize that my comments in the code didn't justify the subtlety of the semantics. Happy to expand the comments once you are satisfied with the correctness. - Vinod ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12603/#review23212 ----------------------------------------------------------- On July 17, 2013, 1:34 a.m., Vinod Kone wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/12603/ > ----------------------------------------------------------- > > (Updated July 17, 2013, 1:34 a.m.) > > > Review request for mesos, Benjamin Hindman and Ben Mahler. > > > Bugs: MESOS-551 > https://issues.apache.org/jira/browse/MESOS-551 > > > Repository: mesos > > > Description > ------- > > See summary. > > > Diffs > ----- > > src/sched/sched.cpp 7ea82e547c612159c9fa24fb6d62e3d2b5f11982 > src/tests/status_update_manager_tests.cpp > 42395324dfe49659bee2229c6573ffef0874d923 > > Diff: https://reviews.apache.org/r/12603/diff/ > > > Testing > ------- > > make check (OSX and Linux) > > > Thanks, > > Vinod Kone > >
