> On April 15, 2014, 9:12 p.m., Ben Mahler wrote:
> > src/log/recover.cpp, line 113
> > <https://reviews.apache.org/r/18600/diff/9/?file=549174#file549174line113>
> >
> >     If we're auto-initializing, shouldn't we 'watch' for the cluster size 
> > as opposed to the quorum size to ensure we don't get stuck?
> 
> Benjamin Hindman wrote:
>     Beyond just watching for the cluster size to appear, what happens when 
> this watch is triggered but before any messages are sent out the replicas die 
> (or are stopped by an operator)? We don't want the replicas to get into a 
> completely blocked state so we need some way of either retrying to do the 
> auto-initialization after some timeouts and all together bailing after some 
> number of retries.

I thought about this again this morning. First, we CANNOT watch for the cluster 
size because we don't know if we are in auto-init case or not until we receive 
responses from replicas. If we are in catch-up case, we don't need all replicas 
to respond.

So I plan to do the following as BenH suggested: add a retry timeout for log 
recovery (retry the recovery after timeout occurs).


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18600/#review40464
-----------------------------------------------------------


On April 4, 2014, 7:12 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18600/
> -----------------------------------------------------------
> 
> (Updated April 4, 2014, 7:12 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Bugs: MESOS-984
>     https://issues.apache.org/jira/browse/MESOS-984
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/log/log.hpp 6787c80 
>   src/log/log.cpp 9dd992f 
>   src/log/recover.hpp 634bc06 
>   src/log/recover.cpp 688da5f 
>   src/tests/log_tests.cpp 4f08927 
> 
> Diff: https://reviews.apache.org/r/18600/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>

Reply via email to