----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/13744/ -----------------------------------------------------------
(Updated Aug. 22, 2013, 10:25 p.m.) Review request for mesos, Benjamin Hindman and Vinod Kone. Bugs: MESOS-658 https://issues.apache.org/jira/browse/MESOS-658 Repository: mesos-git Description ------- This is a split up of https://reviews.apache.org/r/13699/ (has ship its) into two commits. There was a case during re-registration where the re-registered time was not being set. This can cause a serious issue when the following occurs: -Scheduler disconnects from the master, Master::exited(UPID) sets framework->active = false. -Scheduler re-registers with ReregisterFrameworkMessage::failover=false. Currently, the master does _not_ update the re-registration time in this case! -Now the failoverFramework timeout is setup in the Master. -Scheduler disconnects again from the master, Master::exited(UPID) sets active=false once again. -The original failoverFramework timeout fires, compares Framework->reregisteredTime. Since it has not been updated, the master proceeds to shut down the framework on all the slaves! I'll file a bug for this and add it here. Diffs ----- src/master/http.cpp 1ac84a9f75df43632ddbd1fec50333c159651f15 src/master/master.hpp 30752d2698931624fdf4aa6e40ef9fc4ec58dc6d src/master/master.cpp d53b8bb97da45834790cca6e04b70b969a8d3453 Diff: https://reviews.apache.org/r/13744/diff/ Testing ------- make check, I'll look into adding a test that exposed this issue. Thanks, Ben Mahler