> On April 22, 2014, 11:07 p.m., Jiang Yan Xu wrote: > > src/master/flags.hpp, line 90 > > <https://reviews.apache.org/r/20572/diff/1/?file=564783#file564783line90> > > > > I guess slave backoff can't really use this because it doesn't handle > > "failover recovery" separately and still need to reregister within 75secs > > in case it's a network/ZK blip. > > Vinod Kone wrote: > if it's a ZK blip only at the slave, the master wouldn't realize the > slave disconnection. so the slave can always bound its re-registration > retries on this value irrespective of whether the master failed over or not. > does that make sense? > > Jiang Yan Xu wrote: > If it's a full network blip and the slave fails to respond to pings the > master is going to start the 75sec countdown. After network is restored and > detected() invoked, the slave needs to rush to reregister within 75secs right? > > It's probably too large to have a back off delay in the order of minutes > no matter which case it is. Admittedly the large value has to be reached due > to exponential increase from previous failures but these failures can be > local and do not necessarily indicate an overloaded master.
Yan, can the slave do anything with exited notifications here? - Ben ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/20572/#review41080 ----------------------------------------------------------- On April 23, 2014, 9:59 p.m., Vinod Kone wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/20572/ > ----------------------------------------------------------- > > (Updated April 23, 2014, 9:59 p.m.) > > > Review request for mesos, Ben Mahler, Jie Yu, and Jiang Yan Xu. > > > Bugs: MESOS-1226 > https://issues.apache.org/jira/browse/MESOS-1226 > > > Repository: mesos-git > > > Description > ------- > > See summary. > > > Diffs > ----- > > src/Makefile.am 364d63bb1f5dc8b63f72693eafd0b2feec231d13 > src/local/local.cpp 297f35b7755a688a95e58777f7846aa0ff3e247f > src/master/constants.hpp 27ae4f89cfd1ddb7db287d650af160a690f93c26 > src/master/constants.cpp ed966bc5bcc4dbb0f96b966efe33f179723c6759 > src/master/flags.hpp acf39636bca8b259763d2679d7cd7a946a8aa043 > src/master/main.cpp ec23781d2a1e687af031c060059de69079b179b4 > src/master/master.cpp 0335b3416ee1c4d14a70e018ad9174b465035c5f > src/state/log.hpp e25d1e5e1daf9a5a8cd6b7c6c9c95c38b58f892d > src/tests/balloon_framework_test.sh > f83240758b03871b8b53f45d0947c6171c9c3a93 > src/tests/cluster.hpp 1862fe89a6c5897755133232d133dbf3664ed10a > src/tests/mesos.hpp 7bc5e981a468b81f0460e2736c8d0b76518302de > src/tests/mesos.cpp a9844e4cfef2eecbb30ca4bf1fa59d62edf93569 > src/tests/registrar_zookeeper_tests.cpp PRE-CREATION > src/tests/script.cpp 09c7f3bfc8a4c3032116b90b44ca773deff4629d > src/zookeeper/group.cpp bdebc48e8ca793fa58cc0f9a0fc0daa5fb3a335e > > Diff: https://reviews.apache.org/r/20572/diff/ > > > Testing > ------- > > Added a new unit test that tests mesos cluster with registrar and zookeeper. > > Also, updated external tests to use log storage but without zookeeper. > > make check > > > Thanks, > > Vinod Kone > >
