Re: Optimizations for random read performance

Lars George Tue, 16 Feb 2010 06:39:54 -0800

Hi,

Enabling compression on an existing table is a simple "disable, altertable, enable" and after a major compaction the table is compressed.You first have to get LZO setup though (see HBase wiki) but then it isa easy as the above.


Lars

On Feb 16, 2010, at 9:09, Dan Washusen <d...@reactive.org> wrote:

Adding another node would defer the problem a little longer butenabling LZOwould make the biggest difference! The compression it achieved onmy testcluster stunned me! I can't recall the exacts numbers but it wassomethingin the order of magnitude improvement in the file size. If youapply it toyour table you won't have to worry about the block cache for awhile... :)

I'm not how to apply LZO to an existing table. One of the otherusers will

need to help you there...

On 16 February 2010 18:21, James Baldassari <ja...@dataxu.com> wrote:

No, we don't have LZO on the table right now. I guess that'ssomething

else that we can try.  I'll ask our ops team if we can steal another
node or two for the cluster if you think that will help.  I'll report

back with results as soon as I can. Thanks again for working withme on

this!  This is definitely the most responsive users list I've ever
posted to.

-James


On Tue, 2010-02-16 at 01:11 -0600, Dan Washusen wrote:

You could just add another node to your cluster to solve theimmediateproblem. Then keep an eye on load, etc to preemptively add morenodes as
needed?
Out of interest do you have LZO compression enabled on yourtable? That
makes the block cache and IO ops much more effective...

Regarding GC logging:
Also, another option for GC logging is 'jstat'. For example,running thefollowing command will print out the VM heap utilization every 1second:
jstat -gcutil <pid> 1000
The last column shows total amount of time (in seconds) spentgarbage
collecting.   You want to see very small increments...  The other
interesting columns are "O" and "E". They show the percentage ofOld andEden used. If old gen is staying up in the high 90's then thereare more
long lived objects then available memory...

Cheers,
Dan
On 16 February 2010 17:54, James Baldassari <ja...@dataxu.com>wrote:
How much should I give the region servers?  That machine is already
overallocated, by which I mean that the sum of the max heap sizesof

all

java processes running there is greater than the amount physical

memory,

which can lead to swapping.  We have: Hadoop data node, Hadoop task
tracker, ZooKeeper peer, and region server.  The machine has 8G of
physical memory. The region server currently has a max heap sizeof

4G.

Should I increase to 6G? Should I decrease the block cache backdown

to

20% or even lower?  Do we need to move to a 16G server?

Thanks,
James


On Tue, 2010-02-16 at 00:48 -0600, Dan Washusen wrote:

32% IO on region server 3!  Ouch! :)

Increasing the block cache to 40% of VM memory without upping the

total

available memory may only exacerbated the issue.  I notice that

region

server 2 was already using 3300mb of the 4000mb heap. Byincreasing

the

block cache size to 40% you have now given the block cache 1600mb

compared

to the previous 800mb...

Can you give the region servers more memory?

Cheers,
Dan

On 16 February 2010 17:42, James Baldassari <ja...@dataxu.com>

wrote:

On Tue, 2010-02-16 at 00:14 -0600, Stack wrote:

On Mon, Feb 15, 2010 at 10:05 PM, James Baldassari <

ja...@dataxu.com

wrote:

Applying HBASE-2180 isn't really an option at this
time because we've been told to stick with the Cloudera distro.


I'm sure the wouldn't mind (smile).  Seems to about double

throughput.


Hmm, well I might be able to convince them ;)

If I had to guess, I would say the performance issues start to

happen

around the time the region servers hit max heap size, which

occurs

within minutes of exposing the app to live traffic.  Could GC

be

killing

us?  We use the concurrent collector as suggested.  I saw on

the

performance page some mention of limiting the size of the new
generation
like -XX:NewSize=6m -XX:MaxNewSize=6m.  Is that worth trying?
Enable GC logging for a while? See hbase-env.sh. Uncommentthis

line:


# export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails
XX:+PrintGCDateStamps -Xloggc:$HBASE_HOME/logs/gc-hbase.log"


I did uncomment that line, but I can't figure out where the

gc-hbase.log

is.  It's not with the other logs.  When starting HBase the GC

output

seems to be going to stdout rather than the file. Maybe aCloudera
thing.  I'll do some digging.
You are using recent JVM?  1.6.0_10 or greater?  1.6.0_18 might

have

issues.

We're on 1.6.0_16 at the moment.


Whats CPU and iowait or wa in top look like on these machines,
particularly the loaded machine?

How many disks in the machines?


I'll have to ask our ops guys about the disks.  The high load has

now

switched from region server 1 to 3. I just saw in our logsthat it
took
139383.065 milliseconds to do 5000 gets, ~36 gets/second, ouch.

Here

are the highlights from top for each region server:

Region Server 1:
top - 01:39:41 up 4 days, 13:44,  4 users,  load average: 1.89,

0.99,

1.19

Tasks: 194 total,   1 running, 193 sleeping,   0 stopped,   0

zombie

Cpu(s): 15.6%us,  5.8%sy,  0.0%ni, 76.9%id,  0.0%wa,  0.1%hi,

1.6%si,

0.0%st
Mem:   8166588k total,  8112812k used,    53776k free,     8832k

buffers

Swap:  1052248k total,      152k used,  1052096k free,  2831076k

cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+COMMAND21961 hadoop 19 0 4830m 4.2g 10m S 114.3 53.6 37:26.58java
21618 hadoop    21   0 4643m 578m 9804 S 66.1  7.3  19:06.89 java

Region Server 2:
top - 01:40:28 up 4 days, 13:43,  4 users,  load average: 3.93,

2.17,

1.39

Tasks: 194 total,   1 running, 193 sleeping,   0 stopped,   0

zombie

Cpu(s): 11.3%us,  3.1%sy,  0.0%ni, 83.4%id,  1.2%wa,  0.1%hi,

0.9%si,

0.0%st
Mem:   8166588k total,  7971572k used,   195016k free,    34972k

buffers

Swap:  1052248k total,      152k used,  1052096k free,  2944712k

cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+COMMAND15752 hadoop 18 0 4742m 4.1g 10m S 210.6 53.1 41:52.80java15405 hadoop 20 0 4660m 317m 9800 S 114.0 4.0 27:34.17java
Region Server 3:
top - 01:40:35 up 2 days,  9:04,  4 users,  load average: 10.15,

11.05,

11.79
Tasks: 195 total,   1 running, 194 sleeping,   0 stopped,   0

zombie

Cpu(s): 28.7%us, 10.1%sy,  0.0%ni, 25.8%id, 32.9%wa,  0.1%hi,

2.4%si,

0.0%st
Mem:   8166572k total,  8118592k used,    47980k free,     3264k

buffers

Swap:  1052248k total,      140k used,  1052108k free,  2099896k

cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+COMMAND15636 hadoop 18 0 4806m 4.2g 10m S 206.9 53.3 87:48.81java15243 hadoop 18 0 4734m 1.3g 9800 S 117.6 16.7 63:46.52java
-James
St>Ack
Here are the new region server stats along with load averages:

Region Server 1:
request=0.0, regions=16, stores=16, storefiles=33,
storefileIndexSize=4, memstoreSize=1, compactionQueueSize=0,

usedHeap=2891,

maxHeap=4079, blockCacheSize=1403878072,blockCacheFree=307135816,

blockCacheCount=21107, blockCacheHitRatio=84, fsReadLatency=0,
fsWriteLatency=0, fsSyncLatency=0

Load Averages: 10.34, 10.58, 7.08

Region Server 2:
request=0.0, regions=15, stores=16, storefiles=26,

storefileIndexSize=3, memstoreSize=1, compactionQueueSize=0,

usedHeap=3257,

maxHeap=4079, blockCacheSize=661765368, blockCacheFree=193741576,
blockCacheCount=9942, blockCacheHitRatio=77, fsReadLatency=0,
fsWriteLatency=0, fsSyncLatency=0

Load Averages: 1.90, 1.23, 0.98

Region Server 3:
request=0.0, regions=16, stores=16, storefiles=41,

storefileIndexSize=4, memstoreSize=4, compactionQueueSize=0,

usedHeap=1627,

maxHeap=4079, blockCacheSize=665117184, blockCacheFree=190389760,
blockCacheCount=9995, blockCacheHitRatio=70, fsReadLatency=0,
fsWriteLatency=0, fsSyncLatency=0

Load Averages: 2.01, 3.56, 4.18

That first region server is getting hit much harder than the

others.

They're identical machines (8-core), and the distribution of

keys

should

be fairly random, so I'm not sure why that would happen.  Any

other

ideas or suggestions would be greatly appreciated.

Thanks,
James


On Mon, 2010-02-15 at 21:51 -0600, Stack wrote:

Yeah, I was going to say that if your loading is mostly read,

you

can

probably go up from the 0.2 given over to cache.  I like Dan's
suggestion of trying it first on one server, if you can.

St.Ack

On Mon, Feb 15, 2010 at 5:22 PM, Dan Washusen <

d...@reactive.org>

wrote:

So roughly 72% of reads use the blocks held in the block

cache...


It would be interesting to see the difference between when

it

was

working OK

and now.  Could you try increasing the memory allocated to

one

of

the

regions and also increasing the "hfile.block.cache.size" to

say

'0.4' on the

same region?

On 16 February 2010 11:54, James Baldassari <

ja...@dataxu.com>

wrote:

Hi Dan.  Thanks for your suggestions.  I am doing writes at

the

same

time as reads, but there are usually many more reads than

writes.

Here

are the stats for all three region servers:

Region Server 1:
request=0.0, regions=15, stores=16, storefiles=34,

storefileIndexSize=3,

memstoreSize=308, compactionQueueSize=0, usedHeap=3096,

maxHeap=4079,

blockCacheSize=705474544, blockCacheFree=150032400,

blockCacheCount=10606,

blockCacheHitRatio=76, fsReadLatency=0, fsWriteLatency=0,

fsSyncLatency=0


Region Server 2:
request=0.0, regions=16, stores=16, storefiles=39,

storefileIndexSize=4,

memstoreSize=225, compactionQueueSize=0, usedHeap=3380,

maxHeap=4079,

blockCacheSize=643172800, blockCacheFree=212334144,

blockCacheCount=9660,

blockCacheHitRatio=69, fsReadLatency=0, fsWriteLatency=0,

fsSyncLatency=0


Region Server 3:
request=0.0, regions=13, stores=13, storefiles=31,

storefileIndexSize=4,

memstoreSize=177, compactionQueueSize=0, usedHeap=1905,

maxHeap=4079,

blockCacheSize=682848608, blockCacheFree=172658336,

blockCacheCount=10262,

blockCacheHitRatio=72, fsReadLatency=0, fsWriteLatency=0,

fsSyncLatency=0


The average blockCacheHitRatio is about 72.  Is this too

low?

Anything

else I can check?

-James


On Mon, 2010-02-15 at 18:16 -0600, Dan Washusen wrote:

Maybe the block cache is thrashing?

If you are regularly writing data to your tables then

it's

possible that

the

block cache is no longer being effective.  On the region

server

web UI

check

the blockCacheHitRatio value.  You want this value to be

high

(0

- 100).

If

this value is low it means that HBase has to go to disk

to

fetch

blocks

of

data.  You can control the amount of VM memory that HBase

allocates to

the

block cache using the "hfile.block.cache.size" property

(default

is 0.2

(20%)).

Cheers,
Dan

On 16 February 2010 10:45, James Baldassari <

ja...@dataxu.com>

wrote:

Hi,

Does anyone have any tips to share regarding

optimization

for

random

read performance?  For writes I've found that setting a

large

write

buffer and setting auto-flush to false on the client

side

significantly

improved put performance.  Are there any similar easy

tweaks to

improve

random read performance?

I'm using HBase 0.20.3 in a very read-heavy real-time

system

with 1

master and 3 region servers.  It was working ok for a

while,

but today

there was a severe degradation in read performance.

Restarting

Hadoop

and HBase didn't help, are there are no errors in the

logs.

Read

performance starts off around 1,000-2,000 gets/second

but

quickly

(within minutes) drops to around 100 gets/second.

I've already looked at the performance tuning wiki

page.

On

the server

side I've increased hbase.regionserver.handler.count

from

10 to

100,

but

it didn't help.  Maybe this is expected because I'm

only

using

a single

client to do reads.  I'm working on implementing a

client

pool

now, but

I'm wondering if there are any other settings on the

server

or

client

side that might improve things.

Thanks,
James

Re: Optimizations for random read performance

Reply via email to