-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13744/
-----------------------------------------------------------

(Updated Oct. 4, 2013, 6:34 p.m.)


Review request for mesos, Benjamin Hindman and Vinod Kone.


Changes
-------

Updated a comment.


Bugs: MESOS-658
    https://issues.apache.org/jira/browse/MESOS-658


Repository: mesos-git


Description
-------

This is a split up of https://reviews.apache.org/r/13699/ (has ship its) into 
two commits.

There was a case during re-registration where the re-registered time was not 
being set.

This can cause a serious issue when the following occurs:
 -Scheduler disconnects from the master, Master::exited(UPID) sets 
framework->active = false.
 -Scheduler re-registers with ReregisterFrameworkMessage::failover=false. 
Currently, the master does _not_ update the re-registration time in this case!
 -Now the failoverFramework timeout is setup in the Master.
 -Scheduler disconnects again from the master, Master::exited(UPID) sets 
active=false once again.
 -The original failoverFramework timeout fires, compares 
Framework->reregisteredTime. Since it has not been updated, the master proceeds 
to shut down the framework on all the slaves!

I'll file a bug for this and add it here.


Diffs (updated)
-----

  src/master/master.hpp 0aeec7fc540d44c03c1171f31a7281a4b0055925 
  src/master/master.cpp ce8365f082a5f96ef64e33e526cb5047dff52127 

Diff: https://reviews.apache.org/r/13744/diff/


Testing
-------

make check, I'll look into adding a test that exposed this issue.


Thanks,

Ben Mahler

Reply via email to