Thanks Flavio and Ivan. Please find my replies inline. Flavio, Yes we are using trunk with only a few stats related modifications to bookkeeper. Also, I meant < 1ms and not 1 second. It's jmx stat value checked using jconsole and I believe that is in ms. Our entries are small (Around 1KB) and we have a lot of entries in our ledgers (We see close to 20k publishes/second on our hedwig cluster)
We have 15 bookie servers with an ensemble size of 4 and quorum of 3. On Fri, Sep 21, 2012 at 2:59 AM, Ivan Kelly <[email protected]> wrote: > What latency are you getting for the reads? Is the latency too high > for reading individual entries, or for reading a whole ledger? > The latency for reading individual entries is really high. In the order of seconds. This almost always occurs when a backed up subscriber is catching up and there are simultaneous publishes. There are around 15k-20k read entry requests per second across the 15 bookies (This is the sum of the PCBC requests per second) and around 10k add entry requests per second. >From what I can tell, if there are only writes, the latencies are not high. > > Also, how many topics do you have in the system? The number of topics > will define the amount of interleaving in the ledger storage files > which has an affect on how long it will take to read entries from the > ledger. > We have 1000 topics. > > For the disk, check the number of I/O transactions which are > occurring. I don't think you can get it with dstat, but sar -b should > give it to you. I suspect a lot of seeks are occurring. > Thanks for the tip. I'll do this the next time we run into such a problem. > > There's a couple of shortcomings in how bookie's serve reads which > I've wanted to look at for a while, so this could be a good chance. > Could you give some information on what those shortcomings are? Also, do let me know if you need any more information from our end. > > -Ivan > > > > On Thu, Sep 20, 2012 at 07:42:29PM -0700, Aniruddha Laud wrote: > > I'm logging latencies in the per channel bookie client by overriding the > > read completion and add completion callbacks. > > > > When a hedwig subscriber subscribes to a topic and it hasn't consumed any > > messages for a long time, it has to issue requests to bookkeeper to read > > entries. We are observing very high latencies during these operations. > > > > throttle limit for the bookkeeper client is set at 5000. We are using the > > Hierarchical ledger manager. > > > > The bookies have 5 disks. 1 for the journal, 4 for the ledgers. On the > > bookie server pageSize=32768, open file limit = 20000. I checked a few > > bookies and the number of open ledgers was around 5000 on each. However, > > the add and read latencies on the servers were less than 1 second > (measured > > using jconsole and the exposed jmx stats). > > > > Also, the bookie disks are standard spin drives and were doing about > 40MBps > > reads and 10MBps writes when measured with dstat. Any thoughts would be > > helpful. > Regards, Aniruddha.
