On Thu, Mar 6, 2014 at 8:29 AM, Ivan Kelly <[email protected]> wrote:

> > OK, this comment is not entirely clear to me. I thought in your
> > example you had ensemble 3, quorum 2, and you had lost both B2 and
> > B3. In that case, you already lost quorum. Not for L1, but at that
> > point there are cases in which you don't know if you've lost a
> > record. In the specific scenario you describe, we know there is no
> > record 1 because there is no record 0, fine. But, if you had a
> > record 0, then we wouldn't know if we lost a record and consequently
> > the ledger is broken. We may be able to fix this particular case by
> > simply (not) replicating what we have and declaring success, but it
> > is not a general solution, I'm afraid.
> After we lose the first bookie, B3, we are able to detect that the
> ledger is empty and that a bookie is down. However, we don't do
> anything at this point, because the bookie which is down isn't in the
> quorum for the first entry of the ledger. The problem, is that we only
> ever start to perceive the problem when the second bookie, B2 goes
> down.
>
> My point is that we need to deal with the issue when the first bookie
> goes down.
>

Just be curious, isn't it handled by the writer to change ensemble? Unless
that the ledger is idle and not being used anymore.


>
> >
> > >>
> > >>
> > >>>> the postponing is already there, since the ledger couldn't be
> opened and fenced.
> > >>
> > >> Yeah Sijie you are right, it will postpone to next cycle.
> > >> AFAIK AutoRecovery feature will keep on trying to open it again and
> > >> again, this cycle will never ends. It is a kind of hanging too.
> > > Actually, it's a little worse than that. The recovery worker will
> > > acquire the lock on the unreplicated node, try to open, release the
> > > lock, and repeat ad infinitum, without any pause between loops. This
> > > will create a lot of write traffic on zookeeper for the locks.
> >
> >
> > Ok, thanks for the clarification. Having an unbounded number of
> > attempts is definitely not good. Independent of how we solve this
> > problem, I was thinking about keeping track of the number of
> > attempts.
> Ya, adding a ratelimiter would probably be enough.
>
>
> -Ivan
>

Reply via email to