> On Oct. 15, 2014, 4:03 a.m., Adam B wrote: > > src/slave/slave.cpp, lines 938-948 > > <https://reviews.apache.org/r/26699/diff/1/?file=720970#file720970line938> > > > > Couldn't the Slave and the SUM get out of sync here? Right now, the SUM > > will flush its pending status updates as soon as a new master is detected. > > I'm imagining a scenario where the SUM is flushing status updates and > > the slave handles a status ACK interleaved with a slave re-registration > > delivering stale or out-of-sync task states. > > Wouldn't it just be better if the SUM didn't flush until after the > > slave has successfully re-registered?
Definitely thought about this race. Yes, it would be better if SUM did the flush after re-registration but I think it is still a race because re-registration could happen due to ZK blips where updates and acks are in flight. I added a comment on why it is safe. Let me know if you still have concerns. > On Oct. 15, 2014, 4:03 a.m., Adam B wrote: > > src/tests/slave_tests.cpp, lines 1088-1089 > > <https://reviews.apache.org/r/26699/diff/1/?file=720972#file720972line1088> > > > > Verify that it's actually a TASK_RUNNING? I typically only test for things that the test is verifying, to avoid bloating the test. - Vinod ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26699/#review56604 ----------------------------------------------------------- On Oct. 14, 2014, 6:03 p.m., Vinod Kone wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/26699/ > ----------------------------------------------------------- > > (Updated Oct. 14, 2014, 6:03 p.m.) > > > Review request for mesos, Adam B, Ben Mahler, and Niklas Nielsen. > > > Bugs: MESOS-1799 and MESOS-1817 > https://issues.apache.org/jira/browse/MESOS-1799 > https://issues.apache.org/jira/browse/MESOS-1817 > > > Repository: mesos-git > > > Description > ------- > > Slave re-registration now sends both the latest state and unacknowledged > state to the master. > > > Diffs > ----- > > src/slave/slave.hpp 342b09fc084c20d98d096bb129830440179c092c > src/slave/slave.cpp 0e342ed35e3db3b68f9f32b6cf4ace23e4a4db38 > src/tests/fault_tolerance_tests.cpp > a75910d4f486230ba3f1d8927e5f1e5fda6e287b > src/tests/slave_tests.cpp f585bdd20ae1af466f2c1b4d85331ac67451552f > > Diff: https://reviews.apache.org/r/26699/diff/ > > > Testing > ------- > > make check > > Ran new test 1000 times. > > > Thanks, > > Vinod Kone > >
