-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13744/
-----------------------------------------------------------
(Updated Oct. 4, 2013, 6:34 p.m.)
Review request for mesos, Benjamin Hindman and Vinod Kone.
Changes
-------
Updated a comment.
Bugs: MESOS-658
https://issues.apache.org/jira/browse/MESOS-658
Repository: mesos-git
Description
-------
This is a split up of https://reviews.apache.org/r/13699/ (has ship its) into
two commits.
There was a case during re-registration where the re-registered time was not
being set.
This can cause a serious issue when the following occurs:
-Scheduler disconnects from the master, Master::exited(UPID) sets
framework->active = false.
-Scheduler re-registers with ReregisterFrameworkMessage::failover=false.
Currently, the master does _not_ update the re-registration time in this case!
-Now the failoverFramework timeout is setup in the Master.
-Scheduler disconnects again from the master, Master::exited(UPID) sets
active=false once again.
-The original failoverFramework timeout fires, compares
Framework->reregisteredTime. Since it has not been updated, the master proceeds
to shut down the framework on all the slaves!
I'll file a bug for this and add it here.
Diffs (updated)
-----
src/master/master.hpp 0aeec7fc540d44c03c1171f31a7281a4b0055925
src/master/master.cpp ce8365f082a5f96ef64e33e526cb5047dff52127
Diff: https://reviews.apache.org/r/13744/diff/
Testing
-------
make check, I'll look into adding a test that exposed this issue.
Thanks,
Ben Mahler