On Thu, Mar 6, 2014 at 8:29 AM, Ivan Kelly <[email protected]> wrote: > > OK, this comment is not entirely clear to me. I thought in your > > example you had ensemble 3, quorum 2, and you had lost both B2 and > > B3. In that case, you already lost quorum. Not for L1, but at that > > point there are cases in which you don't know if you've lost a > > record. In the specific scenario you describe, we know there is no > > record 1 because there is no record 0, fine. But, if you had a > > record 0, then we wouldn't know if we lost a record and consequently > > the ledger is broken. We may be able to fix this particular case by > > simply (not) replicating what we have and declaring success, but it > > is not a general solution, I'm afraid. > After we lose the first bookie, B3, we are able to detect that the > ledger is empty and that a bookie is down. However, we don't do > anything at this point, because the bookie which is down isn't in the > quorum for the first entry of the ledger. The problem, is that we only > ever start to perceive the problem when the second bookie, B2 goes > down. > > My point is that we need to deal with the issue when the first bookie > goes down. >
Just be curious, isn't it handled by the writer to change ensemble? Unless that the ledger is idle and not being used anymore. > > > > > >> > > >> > > >>>> the postponing is already there, since the ledger couldn't be > opened and fenced. > > >> > > >> Yeah Sijie you are right, it will postpone to next cycle. > > >> AFAIK AutoRecovery feature will keep on trying to open it again and > > >> again, this cycle will never ends. It is a kind of hanging too. > > > Actually, it's a little worse than that. The recovery worker will > > > acquire the lock on the unreplicated node, try to open, release the > > > lock, and repeat ad infinitum, without any pause between loops. This > > > will create a lot of write traffic on zookeeper for the locks. > > > > > > Ok, thanks for the clarification. Having an unbounded number of > > attempts is definitely not good. Independent of how we solve this > > problem, I was thinking about keeping track of the number of > > attempts. > Ya, adding a ratelimiter would probably be enough. > > > -Ivan >
