> On April 15, 2014, 9:12 p.m., Ben Mahler wrote: > > src/log/recover.cpp, line 113 > > <https://reviews.apache.org/r/18600/diff/9/?file=549174#file549174line113> > > > > If we're auto-initializing, shouldn't we 'watch' for the cluster size > > as opposed to the quorum size to ensure we don't get stuck?
Beyond just watching for the cluster size to appear, what happens when this watch is triggered but before any messages are sent out the replicas die (or are stopped by an operator)? We don't want the replicas to get into a completely blocked state so we need some way of either retrying to do the auto-initialization after some timeouts and all together bailing after some number of retries. - Benjamin ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18600/#review40464 ----------------------------------------------------------- On April 4, 2014, 7:12 p.m., Jie Yu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/18600/ > ----------------------------------------------------------- > > (Updated April 4, 2014, 7:12 p.m.) > > > Review request for mesos, Benjamin Hindman and Ben Mahler. > > > Bugs: MESOS-984 > https://issues.apache.org/jira/browse/MESOS-984 > > > Repository: mesos-git > > > Description > ------- > > See summary. > > > Diffs > ----- > > src/log/log.hpp 6787c80 > src/log/log.cpp 9dd992f > src/log/recover.hpp 634bc06 > src/log/recover.cpp 688da5f > src/tests/log_tests.cpp 4f08927 > > Diff: https://reviews.apache.org/r/18600/diff/ > > > Testing > ------- > > make check > > > Thanks, > > Jie Yu > >
