> On July 17, 2013, 12:15 a.m., Bill Farner wrote:
> > src/sched/sched.cpp, line 390
> > <https://reviews.apache.org/r/12603/diff/1/?file=322203#file322203line390>
> >
> >     I'm ignorant to the implications of this, but can you confirm/deny the 
> > following behavior?
> >     
> >     - Queue holds [U1, U2, U3, U4] which have yet to be processed.
> >     
> >     - Update U1 arrives, this code processes it.
> >     
> >     - Scheduler aborts.
> >     
> >     - New scheduler receives retried [U1, U2, U2, U4] (in any order)
> 
> Vinod Kone wrote:
>     Not sure which queue you are referring to, but I'm assuming you mean the 
> 'uuids' set?
>     
>     An update goes into 'uuids' only after it is processed (i.e., 
> Scheduler::statusUpdate() returns) by the scheduler.
>     
>     In the above scenario if a duplicate U1 is enqueued in the libprocess 
> queue and the scheduler aborts after handling the original U1, the driver 
> would've aborted and we would have never come here.
>     
>     When a new scheduler (and driver) becomes the leader they get updates 
> fresh from mesos.
>     
>     Does that make sense?
> 
> Bill Farner wrote:
>     I think you explained behavior for a slightly different scenario than 
> what i'm attempting to describe.
>     
>     - The driver has received [U1, U2, U3, U4], but the scheduler 
> implementation has yet to receive/ACK them.
>     
>     - A duplicate U1 arrives.
>     
>     - Scheduler aborts.
>     
>     What happens in that scenario?  Based on the verbiage in the diff, it 
> sounds as though U1 is ACKed to other parts of the system, and will not be 
> retried when the new scheduler takes over.
> 
> Vinod Kone wrote:
>     It is not possible for U1,U2,U3 and U4 to have been processed by the 
> driver while the scheduler has not yet processed U1. The scheduler should've 
> processed U1 before U2 can be processed by the driver, since it is a 
> synchronous call into the scheduler. Subsequently, if a duplicate U1 (i.e., 
> U1 is in 'uuids') is being processed by the driver, it means the driver has 
> not aborted when it dealt with the original U1. Because if the driver aborted 
> while handling the original U1, 'aborted' flag would've been set before the 
> driver processes any other updates. Makes sense?
>     
>     I now realize that my comments in the code didn't justify the subtlety of 
> the semantics. Happy to expand the comments once you are satisfied with the 
> correctness.
>     
>

Thanks for the detail.  This and offline conversation helped clarify.  I think 
i can do a more competent review now :-)


- Bill


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12603/#review23212
-----------------------------------------------------------


On July 17, 2013, 1:34 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/12603/
> -----------------------------------------------------------
> 
> (Updated July 17, 2013, 1:34 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Bugs: MESOS-551
>     https://issues.apache.org/jira/browse/MESOS-551
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/sched/sched.cpp 7ea82e547c612159c9fa24fb6d62e3d2b5f11982 
>   src/tests/status_update_manager_tests.cpp 
> 42395324dfe49659bee2229c6573ffef0874d923 
> 
> Diff: https://reviews.apache.org/r/12603/diff/
> 
> 
> Testing
> -------
> 
> make check (OSX and Linux)
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>

Reply via email to