> On July 17, 2013, 12:15 a.m., Bill Farner wrote: > > src/sched/sched.cpp, line 390 > > <https://reviews.apache.org/r/12603/diff/1/?file=322203#file322203line390> > > > > I'm ignorant to the implications of this, but can you confirm/deny the > > following behavior? > > > > - Queue holds [U1, U2, U3, U4] which have yet to be processed. > > > > - Update U1 arrives, this code processes it. > > > > - Scheduler aborts. > > > > - New scheduler receives retried [U1, U2, U2, U4] (in any order) > > Vinod Kone wrote: > Not sure which queue you are referring to, but I'm assuming you mean the > 'uuids' set? > > An update goes into 'uuids' only after it is processed (i.e., > Scheduler::statusUpdate() returns) by the scheduler. > > In the above scenario if a duplicate U1 is enqueued in the libprocess > queue and the scheduler aborts after handling the original U1, the driver > would've aborted and we would have never come here. > > When a new scheduler (and driver) becomes the leader they get updates > fresh from mesos. > > Does that make sense? > > Bill Farner wrote: > I think you explained behavior for a slightly different scenario than > what i'm attempting to describe. > > - The driver has received [U1, U2, U3, U4], but the scheduler > implementation has yet to receive/ACK them. > > - A duplicate U1 arrives. > > - Scheduler aborts. > > What happens in that scenario? Based on the verbiage in the diff, it > sounds as though U1 is ACKed to other parts of the system, and will not be > retried when the new scheduler takes over. > > Vinod Kone wrote: > It is not possible for U1,U2,U3 and U4 to have been processed by the > driver while the scheduler has not yet processed U1. The scheduler should've > processed U1 before U2 can be processed by the driver, since it is a > synchronous call into the scheduler. Subsequently, if a duplicate U1 (i.e., > U1 is in 'uuids') is being processed by the driver, it means the driver has > not aborted when it dealt with the original U1. Because if the driver aborted > while handling the original U1, 'aborted' flag would've been set before the > driver processes any other updates. Makes sense? > > I now realize that my comments in the code didn't justify the subtlety of > the semantics. Happy to expand the comments once you are satisfied with the > correctness. > >
Thanks for the detail. This and offline conversation helped clarify. I think i can do a more competent review now :-) - Bill ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12603/#review23212 ----------------------------------------------------------- On July 17, 2013, 1:34 a.m., Vinod Kone wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/12603/ > ----------------------------------------------------------- > > (Updated July 17, 2013, 1:34 a.m.) > > > Review request for mesos, Benjamin Hindman and Ben Mahler. > > > Bugs: MESOS-551 > https://issues.apache.org/jira/browse/MESOS-551 > > > Repository: mesos > > > Description > ------- > > See summary. > > > Diffs > ----- > > src/sched/sched.cpp 7ea82e547c612159c9fa24fb6d62e3d2b5f11982 > src/tests/status_update_manager_tests.cpp > 42395324dfe49659bee2229c6573ffef0874d923 > > Diff: https://reviews.apache.org/r/12603/diff/ > > > Testing > ------- > > make check (OSX and Linux) > > > Thanks, > > Vinod Kone > >
