> On April 22, 2014, 11:07 p.m., Jiang Yan Xu wrote:
> > src/master/flags.hpp, line 90
> > <https://reviews.apache.org/r/20572/diff/1/?file=564783#file564783line90>
> >
> >     I guess slave backoff can't really use this because it doesn't handle 
> > "failover recovery" separately and still need to reregister within 75secs 
> > in case it's a network/ZK blip.
> 
> Vinod Kone wrote:
>     if it's a ZK blip only at the slave, the master wouldn't realize the 
> slave disconnection. so the slave can always bound its re-registration 
> retries on this value irrespective of whether the master failed over or not. 
> does that make sense?
> 
> Jiang Yan Xu wrote:
>     If it's a full network blip and the slave fails to respond to pings the 
> master is going to start the 75sec countdown. After network is restored and 
> detected() invoked, the slave needs to rush to reregister within 75secs right?
>     
>     It's probably too large to have a back off delay in the order of minutes 
> no matter which case it is. Admittedly the large value has to be reached due 
> to exponential increase from previous failures but these failures can be 
> local and do not necessarily indicate an overloaded master.

Yan, can the slave do anything with exited notifications here?


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20572/#review41080
-----------------------------------------------------------


On April 23, 2014, 9:59 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20572/
> -----------------------------------------------------------
> 
> (Updated April 23, 2014, 9:59 p.m.)
> 
> 
> Review request for mesos, Ben Mahler, Jie Yu, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-1226
>     https://issues.apache.org/jira/browse/MESOS-1226
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am 364d63bb1f5dc8b63f72693eafd0b2feec231d13 
>   src/local/local.cpp 297f35b7755a688a95e58777f7846aa0ff3e247f 
>   src/master/constants.hpp 27ae4f89cfd1ddb7db287d650af160a690f93c26 
>   src/master/constants.cpp ed966bc5bcc4dbb0f96b966efe33f179723c6759 
>   src/master/flags.hpp acf39636bca8b259763d2679d7cd7a946a8aa043 
>   src/master/main.cpp ec23781d2a1e687af031c060059de69079b179b4 
>   src/master/master.cpp 0335b3416ee1c4d14a70e018ad9174b465035c5f 
>   src/state/log.hpp e25d1e5e1daf9a5a8cd6b7c6c9c95c38b58f892d 
>   src/tests/balloon_framework_test.sh 
> f83240758b03871b8b53f45d0947c6171c9c3a93 
>   src/tests/cluster.hpp 1862fe89a6c5897755133232d133dbf3664ed10a 
>   src/tests/mesos.hpp 7bc5e981a468b81f0460e2736c8d0b76518302de 
>   src/tests/mesos.cpp a9844e4cfef2eecbb30ca4bf1fa59d62edf93569 
>   src/tests/registrar_zookeeper_tests.cpp PRE-CREATION 
>   src/tests/script.cpp 09c7f3bfc8a4c3032116b90b44ca773deff4629d 
>   src/zookeeper/group.cpp bdebc48e8ca793fa58cc0f9a0fc0daa5fb3a335e 
> 
> Diff: https://reviews.apache.org/r/20572/diff/
> 
> 
> Testing
> -------
> 
> Added a new unit test that tests mesos cluster with registrar and zookeeper.
> 
> Also, updated external tests to use log storage but without zookeeper.
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>

Reply via email to