> On April 22, 2014, 4:07 p.m., Jiang Yan Xu wrote: > > src/master/flags.hpp, line 90 > > <https://reviews.apache.org/r/20572/diff/1/?file=564783#file564783line90> > > > > I guess slave backoff can't really use this because it doesn't handle > > "failover recovery" separately and still need to reregister within 75secs > > in case it's a network/ZK blip. > > Vinod Kone wrote: > if it's a ZK blip only at the slave, the master wouldn't realize the > slave disconnection. so the slave can always bound its re-registration > retries on this value irrespective of whether the master failed over or not. > does that make sense?
If it's a full network blip and the slave fails to respond to pings the master is going to start the 75sec countdown. After network is restored and detected() invoked, the slave needs to rush to reregister within 75secs right? It's probably too large to have a back off delay in the order of minutes no matter which case it is. Admittedly the large value has to be reached due to exponential increase from previous failures but these failures can be local and do not necessarily indicate an overloaded master. - Jiang Yan ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/20572/#review41080 ----------------------------------------------------------- On April 23, 2014, 11:06 a.m., Vinod Kone wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/20572/ > ----------------------------------------------------------- > > (Updated April 23, 2014, 11:06 a.m.) > > > Review request for mesos, Ben Mahler, Jie Yu, and Jiang Yan Xu. > > > Bugs: MESOS-1226 > https://issues.apache.org/jira/browse/MESOS-1226 > > > Repository: mesos-git > > > Description > ------- > > See summary. > > > Diffs > ----- > > src/Makefile.am a44ea42ec73067b8f58c729c0d0f6413fa5da01d > src/local/local.cpp 297f35b7755a688a95e58777f7846aa0ff3e247f > src/master/constants.hpp 27ae4f89cfd1ddb7db287d650af160a690f93c26 > src/master/constants.cpp ed966bc5bcc4dbb0f96b966efe33f179723c6759 > src/master/flags.hpp acf39636bca8b259763d2679d7cd7a946a8aa043 > src/master/main.cpp ec23781d2a1e687af031c060059de69079b179b4 > src/master/master.cpp 0335b3416ee1c4d14a70e018ad9174b465035c5f > src/state/log.hpp e25d1e5e1daf9a5a8cd6b7c6c9c95c38b58f892d > src/tests/balloon_framework_test.sh > f83240758b03871b8b53f45d0947c6171c9c3a93 > src/tests/cluster.hpp 1862fe89a6c5897755133232d133dbf3664ed10a > src/tests/mesos.hpp 7bc5e981a468b81f0460e2736c8d0b76518302de > src/tests/mesos.cpp a9844e4cfef2eecbb30ca4bf1fa59d62edf93569 > src/tests/registrar_zookeeper_tests.cpp PRE-CREATION > src/tests/script.cpp 09c7f3bfc8a4c3032116b90b44ca773deff4629d > src/zookeeper/group.cpp bdebc48e8ca793fa58cc0f9a0fc0daa5fb3a335e > > Diff: https://reviews.apache.org/r/20572/diff/ > > > Testing > ------- > > Added a new unit test that tests mesos cluster with registrar and zookeeper. > > Also, updated external tests to use log storage but without zookeeper. > > make check > > > Thanks, > > Vinod Kone > >
