Hi Ivan, Here is the JIRA for this issue BK-336. Worth discussing there? (so, that we will not loose some comments here and there).
Regards, Uma ________________________________________ From: Ivan Kelly [[email protected]] Sent: Wednesday, July 11, 2012 9:53 PM To: [email protected] Subject: Re: Read entry performance when bookie in the ensemble in unreachable..... This will absolutely kill performance in the case of a failed bookie though. In a e3q2 ledger, performance will drop to 0.333 entries per second, in the case of a 10 second timeout. (we hit a 10 second delay for every third entry). A bookie failing is not a cornor case. For a blacklisting, if an read fails from a bookie, we could refrain from reading from the bookie again, if and only if the bookie does not exist in /ledgers/available. A better solution, but one that would take a lot of change, would be to change how we do reads to include a readahead. The general usecase for bookkeeper is writeahead logging, so clients will generally not be interested in reading a single entry. They want to read an entry, and every other subsequent entry in that ledger. So when we read from a ledger, we should read N entries per RPC, and cache them in the readOp. At the moment we read one at a time. Of course, one bookie will not have all the entries, so we still need to read from multiple sources. I think this change should wait for 4.3 though. -Ivan On Tue, Jul 10, 2012 at 02:14:45PM +0200, Flavio Junqueira wrote: > I'm not sure blacklisting is a good idea, the bookie could be simply a little > slow at that point for whatever reason. It might be a better idea to have a > separate timeout for reads. We send read requests to the quorum sequentially > to optimize the use bandwidth, but it is clear that slow/dead bookies can > hurt performance. We have essentially optimized for what we believed to be > the common case. > > -Flavio > > On Jul 10, 2012, at 12:52 PM, Ivan Kelly wrote: > > > Hmmm, perhaps we should add a blacklist to the PendingReadOp, or we > > could decide to only read from bookies which are in /ledgers/available > > > > Could you open a JIRA for this. > > > > Cheers > > Ivan > > > > On Fri, Jul 06, 2012 at 04:55:10PM +0000, Rakesh R wrote: > >> Hi All, > >> > >> > >> Tested the following scenario with Bookkeeper-4.1 release: > >> ---------------------------------------------------------- > >> Scenario: > >> Prerequisites -> Say BK1, BK2, BK3 > >> 1) Create a ledger with say ensemble=3 quorum=2 and added say 100 entries. > >> 2) After this, BK1 machine is isolated by unplug the network cable of BK1. > >> > >> > >> > >> What I've observed is waiting nearly 10seconds for connection timeout from > >> the failed bookie and only after that bkclient is retrying to the other > >> bookie...I'm thinking this will affect the read entries performance. > >> > >> Also, I could see if the machine is reachable(and just shutdown the bookie > >> server) the timeout is very less and immediately retrynig to the other > >> bookie in the ensemble. > >> > >> > >> Is there any thing I'm missing to configure in netty channels or in bookie > >> server side for the bookie connection timeout? > >> > >> > >> > >> > >> Following is the sample log from the test environment: > >> ------------------------------------------------------ > >> > >> 2012-07-06 22:07:59,734 ERROR > >> hidden.bkjournal.org.apache.bookkeeper.client.PendingReadOp: Bookie handle > >> is not available while reading entry: 18 ledgerId: 3 from bookie: > >> /HOST1:3181 > >> > >> 2012-07-06 22:08:10,234 ERROR > >> hidden.bkjournal.org.apache.bookkeeper.proto.PerChannelBookieClient: Could > >> not connect to bookie: /HOST1:3181 > >> 2012-07-06 22:08:10,234 ERROR > >> hidden.bkjournal.org.apache.bookkeeper.client.PendingReadOp: Bookie handle > >> is not available while reading entry: 21 ledgerId: 3 from bookie: > >> /HOST1:3181 > >> 2012-07-06 22:08:21,234 ERROR > >> hidden.bkjournal.org.apache.bookkeeper.proto.PerChannelBookieClient: Could > >> not connect to bookie: /HOST1:3181 > >> 2012-07-06 22:08:21,234 ERROR > >> hidden.bkjournal.org.apache.bookkeeper.client.PendingReadOp: Bookie handle > >> is not available while reading entry: 24 ledgerId: 3 from bookie: > >> /HOST1:3181 > >> 2012-07-06 22:08:31,734 ERROR > >> hidden.bkjournal.org.apache.bookkeeper.proto.PerChannelBookieClient: Could > >> not connect to bookie: /HOST1:3181 > >> 2012-07-06 22:08:31,734 ERROR > >> hidden.bkjournal.org.apache.bookkeeper.client.PendingReadOp: Bookie handle > >> is not available while reading entry: 27 ledgerId: 3 from bookie: > >> /HOST1:3181 > >> 2012-07-06 22:08:42,734 ERROR > >> hidden.bkjournal.org.apache.bookkeeper.proto.PerChannelBookieClient: Could > >> not connect to bookie: /HOST1:3181 > >> 2012-07-06 22:08:42,734 ERROR > >> hidden.bkjournal.org.apache.bookkeeper.client.PendingReadOp: Bookie handle > >> is not available while reading entry: 30 ledgerId: 3 from bookie: > >> /HOST1:3181 > >> 2012-07-06 22:08:53,234 ERROR > >> hidden.bkjournal.org.apache.bookkeeper.proto.PerChannelBookieClient: Could > >> not connect to bookie: /HOST1:3181 > >> > >> Thanks & Regards > >> Rakesh R >
