Re: Optimizations for random read performance

Stack Mon, 15 Feb 2010 22:15:29 -0800

On Mon, Feb 15, 2010 at 10:05 PM, James Baldassari <ja...@dataxu.com> wrote:
>  Applying HBASE-2180 isn't really an option at this
> time because we've been told to stick with the Cloudera distro.


I'm sure the wouldn't mind (smile).  Seems to about double throughput.


> If I had to guess, I would say the performance issues start to happen
> around the time the region servers hit max heap size, which occurs
> within minutes of exposing the app to live traffic.  Could GC be killing
> us?  We use the concurrent collector as suggested.  I saw on the
> performance page some mention of limiting the size of the new generation
> like -XX:NewSize=6m -XX:MaxNewSize=6m.  Is that worth trying?

Enable GC logging for a while?  See hbase-env.sh.  Uncomment this line:

# export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails
XX:+PrintGCDateStamps -Xloggc:$HBASE_HOME/logs/gc-hbase.log"

You are using recent JVM?  1.6.0_10 or greater?  1.6.0_18 might have issues.

Whats CPU and iowait or wa in top look like on these machines,
particularly the loaded machine?

How many disks in the machines?

St>Ack



>
> Here are the new region server stats along with load averages:
>
> Region Server 1:
> request=0.0, regions=16, stores=16, storefiles=33, storefileIndexSize=4, 
> memstoreSize=1, compactionQueueSize=0, usedHeap=2891, maxHeap=4079, 
> blockCacheSize=1403878072, blockCacheFree=307135816, blockCacheCount=21107, 
> blockCacheHitRatio=84, fsReadLatency=0, fsWriteLatency=0, fsSyncLatency=0
> Load Averages: 10.34, 10.58, 7.08
>
> Region Server 2:
> request=0.0, regions=15, stores=16, storefiles=26, storefileIndexSize=3, 
> memstoreSize=1, compactionQueueSize=0, usedHeap=3257, maxHeap=4079, 
> blockCacheSize=661765368, blockCacheFree=193741576, blockCacheCount=9942, 
> blockCacheHitRatio=77, fsReadLatency=0, fsWriteLatency=0, fsSyncLatency=0
> Load Averages: 1.90, 1.23, 0.98
>
> Region Server 3:
> request=0.0, regions=16, stores=16, storefiles=41, storefileIndexSize=4, 
> memstoreSize=4, compactionQueueSize=0, usedHeap=1627, maxHeap=4079, 
> blockCacheSize=665117184, blockCacheFree=190389760, blockCacheCount=9995, 
> blockCacheHitRatio=70, fsReadLatency=0, fsWriteLatency=0, fsSyncLatency=0
> Load Averages: 2.01, 3.56, 4.18
>
> That first region server is getting hit much harder than the others.
> They're identical machines (8-core), and the distribution of keys should
> be fairly random, so I'm not sure why that would happen.  Any other
> ideas or suggestions would be greatly appreciated.
>
> Thanks,
> James
>
>
> On Mon, 2010-02-15 at 21:51 -0600, Stack wrote:
>> Yeah, I was going to say that if your loading is mostly read, you can
>> probably go up from the 0.2 given over to cache.  I like Dan's
>> suggestion of trying it first on one server, if you can.
>>
>> St.Ack
>>
>> On Mon, Feb 15, 2010 at 5:22 PM, Dan Washusen <d...@reactive.org> wrote:
>> > So roughly 72% of reads use the blocks held in the block cache...
>> >
>> > It would be interesting to see the difference between when it was working 
>> > OK
>> > and now.  Could you try increasing the memory allocated to one of the
>> > regions and also increasing the "hfile.block.cache.size" to say '0.4' on 
>> > the
>> > same region?
>> >
>> > On 16 February 2010 11:54, James Baldassari <ja...@dataxu.com> wrote:
>> >
>> >> Hi Dan.  Thanks for your suggestions.  I am doing writes at the same
>> >> time as reads, but there are usually many more reads than writes.  Here
>> >> are the stats for all three region servers:
>> >>
>> >> Region Server 1:
>> >> request=0.0, regions=15, stores=16, storefiles=34, storefileIndexSize=3,
>> >> memstoreSize=308, compactionQueueSize=0, usedHeap=3096, maxHeap=4079,
>> >> blockCacheSize=705474544, blockCacheFree=150032400, blockCacheCount=10606,
>> >> blockCacheHitRatio=76, fsReadLatency=0, fsWriteLatency=0, fsSyncLatency=0
>> >>
>> >> Region Server 2:
>> >> request=0.0, regions=16, stores=16, storefiles=39, storefileIndexSize=4,
>> >> memstoreSize=225, compactionQueueSize=0, usedHeap=3380, maxHeap=4079,
>> >> blockCacheSize=643172800, blockCacheFree=212334144, blockCacheCount=9660,
>> >> blockCacheHitRatio=69, fsReadLatency=0, fsWriteLatency=0, fsSyncLatency=0
>> >>
>> >> Region Server 3:
>> >> request=0.0, regions=13, stores=13, storefiles=31, storefileIndexSize=4,
>> >> memstoreSize=177, compactionQueueSize=0, usedHeap=1905, maxHeap=4079,
>> >> blockCacheSize=682848608, blockCacheFree=172658336, blockCacheCount=10262,
>> >> blockCacheHitRatio=72, fsReadLatency=0, fsWriteLatency=0, fsSyncLatency=0
>> >>
>> >> The average blockCacheHitRatio is about 72.  Is this too low?  Anything
>> >> else I can check?
>> >>
>> >> -James
>> >>
>> >>
>> >> On Mon, 2010-02-15 at 18:16 -0600, Dan Washusen wrote:
>> >> > Maybe the block cache is thrashing?
>> >> >
>> >> > If you are regularly writing data to your tables then it's possible that
>> >> the
>> >> > block cache is no longer being effective.  On the region server web UI
>> >> check
>> >> > the blockCacheHitRatio value.  You want this value to be high (0 - 100).
>> >>  If
>> >> > this value is low it means that HBase has to go to disk to fetch blocks
>> >> of
>> >> > data.  You can control the amount of VM memory that HBase allocates to
>> >> the
>> >> > block cache using the "hfile.block.cache.size" property (default is 0.2
>> >> > (20%)).
>> >> >
>> >> > Cheers,
>> >> > Dan
>> >> >
>> >> > On 16 February 2010 10:45, James Baldassari <ja...@dataxu.com> wrote:
>> >> >
>> >> > > Hi,
>> >> > >
>> >> > > Does anyone have any tips to share regarding optimization for random
>> >> > > read performance?  For writes I've found that setting a large write
>> >> > > buffer and setting auto-flush to false on the client side 
>> >> > > significantly
>> >> > > improved put performance.  Are there any similar easy tweaks to 
>> >> > > improve
>> >> > > random read performance?
>> >> > >
>> >> > > I'm using HBase 0.20.3 in a very read-heavy real-time system with 1
>> >> > > master and 3 region servers.  It was working ok for a while, but today
>> >> > > there was a severe degradation in read performance.  Restarting Hadoop
>> >> > > and HBase didn't help, are there are no errors in the logs.  Read
>> >> > > performance starts off around 1,000-2,000 gets/second but quickly
>> >> > > (within minutes) drops to around 100 gets/second.
>> >> > >
>> >> > > I've already looked at the performance tuning wiki page.  On the 
>> >> > > server
>> >> > > side I've increased hbase.regionserver.handler.count from 10 to 100,
>> >> but
>> >> > > it didn't help.  Maybe this is expected because I'm only using a 
>> >> > > single
>> >> > > client to do reads.  I'm working on implementing a client pool now, 
>> >> > > but
>> >> > > I'm wondering if there are any other settings on the server or client
>> >> > > side that might improve things.
>> >> > >
>> >> > > Thanks,
>> >> > > James
>> >> > >
>> >> > >
>> >> > >
>> >>
>> >>
>> >
>
>

Re: Optimizations for random read performance

Reply via email to