Thanks Flavio and Ivan. Please find my replies inline.

Flavio,
Yes we are using trunk with only a few stats related modifications to
bookkeeper. Also, I meant < 1ms and not 1 second. It's jmx stat value
checked using jconsole and I believe that is in ms. Our entries are small
(Around 1KB) and we have a lot of entries in our ledgers (We see close to
20k publishes/second on our hedwig cluster)

We have 15 bookie servers with an ensemble size of 4 and quorum of 3.

On Fri, Sep 21, 2012 at 2:59 AM, Ivan Kelly <[email protected]> wrote:

> What latency are you getting for the reads? Is the latency too high
> for reading individual entries, or for reading a whole ledger?
>
The latency for reading individual entries is really high. In the order of
seconds. This almost always occurs when a backed up subscriber is catching
up and there are simultaneous publishes. There are around 15k-20k read
entry requests per second across the 15 bookies (This is the sum of the
PCBC requests per second) and around 10k add entry requests per second.
>From what I can tell, if there are only writes, the latencies are not high.

>
> Also, how many topics do you have in the system? The number of topics
> will define the amount of interleaving in the ledger storage files
> which has an affect on how long it will take to read entries from the
> ledger.
>
We have 1000 topics.

>
> For the disk, check the number of I/O transactions which are
> occurring. I don't think you can get it with dstat, but sar -b should
> give it to you. I suspect a lot of seeks are occurring.
>
Thanks for the tip. I'll do this the next time we run into such a problem.

>
> There's a couple of shortcomings in how bookie's serve reads which
> I've wanted to look at for a while, so this could be a good chance.
>
Could you give some information on what those shortcomings are? Also, do
let me know if you need any more information from our end.

>
> -Ivan
>
>
>
> On Thu, Sep 20, 2012 at 07:42:29PM -0700, Aniruddha Laud wrote:
> > I'm logging latencies in the per channel bookie client by overriding the
> > read completion and add completion callbacks.
> >
> > When a hedwig subscriber subscribes to a topic and it hasn't consumed any
> > messages for a long time, it has to issue requests to bookkeeper to read
> > entries. We are observing very high latencies during these operations.
> >
> > throttle limit for the bookkeeper client is set at 5000. We are using the
> > Hierarchical ledger manager.
> >
> > The bookies have 5 disks. 1 for the journal, 4 for the ledgers. On the
> > bookie server pageSize=32768, open file limit = 20000. I checked a few
> > bookies and the number of open ledgers was around 5000 on each. However,
> > the add and read latencies on the servers were less than 1 second
> (measured
> > using jconsole and the exposed jmx stats).
> >
> > Also, the bookie disks are standard spin drives and were doing about
> 40MBps
> > reads and 10MBps writes when measured with dstat. Any thoughts would be
> > helpful.
>

Regards,
Aniruddha.

Reply via email to