Re: Optimizations for random read performance

Stack Mon, 15 Feb 2010 19:52:25 -0800

Yeah, I was going to say that if your loading is mostly read, you can
probably go up from the 0.2 given over to cache.  I like Dan's
suggestion of trying it first on one server, if you can.


St.Ack

On Mon, Feb 15, 2010 at 5:22 PM, Dan Washusen <d...@reactive.org> wrote:
> So roughly 72% of reads use the blocks held in the block cache...
>
> It would be interesting to see the difference between when it was working OK
> and now.  Could you try increasing the memory allocated to one of the
> regions and also increasing the "hfile.block.cache.size" to say '0.4' on the
> same region?
>
> On 16 February 2010 11:54, James Baldassari <ja...@dataxu.com> wrote:
>
>> Hi Dan.  Thanks for your suggestions.  I am doing writes at the same
>> time as reads, but there are usually many more reads than writes.  Here
>> are the stats for all three region servers:
>>
>> Region Server 1:
>> request=0.0, regions=15, stores=16, storefiles=34, storefileIndexSize=3,
>> memstoreSize=308, compactionQueueSize=0, usedHeap=3096, maxHeap=4079,
>> blockCacheSize=705474544, blockCacheFree=150032400, blockCacheCount=10606,
>> blockCacheHitRatio=76, fsReadLatency=0, fsWriteLatency=0, fsSyncLatency=0
>>
>> Region Server 2:
>> request=0.0, regions=16, stores=16, storefiles=39, storefileIndexSize=4,
>> memstoreSize=225, compactionQueueSize=0, usedHeap=3380, maxHeap=4079,
>> blockCacheSize=643172800, blockCacheFree=212334144, blockCacheCount=9660,
>> blockCacheHitRatio=69, fsReadLatency=0, fsWriteLatency=0, fsSyncLatency=0
>>
>> Region Server 3:
>> request=0.0, regions=13, stores=13, storefiles=31, storefileIndexSize=4,
>> memstoreSize=177, compactionQueueSize=0, usedHeap=1905, maxHeap=4079,
>> blockCacheSize=682848608, blockCacheFree=172658336, blockCacheCount=10262,
>> blockCacheHitRatio=72, fsReadLatency=0, fsWriteLatency=0, fsSyncLatency=0
>>
>> The average blockCacheHitRatio is about 72.  Is this too low?  Anything
>> else I can check?
>>
>> -James
>>
>>
>> On Mon, 2010-02-15 at 18:16 -0600, Dan Washusen wrote:
>> > Maybe the block cache is thrashing?
>> >
>> > If you are regularly writing data to your tables then it's possible that
>> the
>> > block cache is no longer being effective.  On the region server web UI
>> check
>> > the blockCacheHitRatio value.  You want this value to be high (0 - 100).
>>  If
>> > this value is low it means that HBase has to go to disk to fetch blocks
>> of
>> > data.  You can control the amount of VM memory that HBase allocates to
>> the
>> > block cache using the "hfile.block.cache.size" property (default is 0.2
>> > (20%)).
>> >
>> > Cheers,
>> > Dan
>> >
>> > On 16 February 2010 10:45, James Baldassari <ja...@dataxu.com> wrote:
>> >
>> > > Hi,
>> > >
>> > > Does anyone have any tips to share regarding optimization for random
>> > > read performance?  For writes I've found that setting a large write
>> > > buffer and setting auto-flush to false on the client side significantly
>> > > improved put performance.  Are there any similar easy tweaks to improve
>> > > random read performance?
>> > >
>> > > I'm using HBase 0.20.3 in a very read-heavy real-time system with 1
>> > > master and 3 region servers.  It was working ok for a while, but today
>> > > there was a severe degradation in read performance.  Restarting Hadoop
>> > > and HBase didn't help, are there are no errors in the logs.  Read
>> > > performance starts off around 1,000-2,000 gets/second but quickly
>> > > (within minutes) drops to around 100 gets/second.
>> > >
>> > > I've already looked at the performance tuning wiki page.  On the server
>> > > side I've increased hbase.regionserver.handler.count from 10 to 100,
>> but
>> > > it didn't help.  Maybe this is expected because I'm only using a single
>> > > client to do reads.  I'm working on implementing a client pool now, but
>> > > I'm wondering if there are any other settings on the server or client
>> > > side that might improve things.
>> > >
>> > > Thanks,
>> > > James
>> > >
>> > >
>> > >
>>
>>
>

Re: Optimizations for random read performance

Reply via email to