Forgot to mention. In addition to not over-committing your memory
resources to the JVMs, you should also set the swappiness to 0 on the
kernel.
Info on how to do this (and links to the flame wars on linux kernel
mailing lists and slashdot) here:
http://www.sollers.ca/blog/2008/swappiness/
JG
Jonathan Gray wrote:
Stephen,
Above all, avoid swapping at all costs. If some of the pages get
swapped out from HBase, and then the GC comes along and it has to wait
for these pages to get put back in to memory, you're going to have very
long running GC pauses and all hell ensues from there.
You are running with 3GB + 2GB + 13 * 0.5GB but you have only 4GB of
memory. That's the kind of heapsize configuration that requires 12-16GB
of RAM.
For a 4GB setup, I would recommend 2GB heap the RS, 1GB (max) to the
DataNode, and 256MB to your tasks.
With only 2 cores, you're going to have issues concurrently running the
RS, DN, TT, and MR tasks. I'd be wary of running more than 1 task per
node (I'm actually wary of running all of this together on 2 cores). The
JVM is very sensitive to starvation and if the processes can not run you
are not going to get very far.
You don't need to give HBase top-notch nodes, necessarily, but if you
want to run 4 different things (or 16 total as you did) on only 2 cores
and in 4GB, you're going to have problems. 4cores and 4GB can get you a
bit further, but at that point you'll see enormous performance boosts by
upping memory to 8GB or beyond.
Hope that explains it a bit better.
JG
stephen mulcahy wrote:
Jonathan Gray wrote:
Can you provide more information about the hardware on your nodes?
Sure. The cluster is 5 nodes, each with 4GB of ram, 2 cores (opteron
250) and 2 x 1TB drives.
I think I saw you had 13 mappers running? And only 512MB of heap for
the regionserver? That is a very small amount of heap for HBase to run.
The HBASE_HEAPSIZE is set to 3000. Checking the HADOOP_HEAPSIZE, I see
it's set to 2000 - so before I run any M/R jobs I guess I have some
problems here. I inherited the config so I didn't check this before
now (mea culpa) - I guess this illustrates why the data nodes might
have been dropping out anyways.
The 512MB heap was allocated to the mapred tasks - and I was
experimenting with different numbers of mappers and reducers in an
effort to get the Maholo backup job running. Initial efforts with 2+1
didn't get anywhere, while bumping it to 13+5 seemed to allow it to
progress a lot further. But obviously caused other problems.
In terms if proper heap sizing for hadoop, hbase, mapreduce - should I
avoid swapping at all costs? Whats a good proportion of memory between
hbase heap and hadoop heap (I haven't seen this documented anywhere,
if it is - feel free to hit me with the rtfm stick :)
How many cores do you have and what are your disks like? 13 mappers
per node is WAY too high. If you have 4 core nodes and you are
running the regionserver with the datanode and also trying to get MR
tasks on the same nodes, you should probably not go over 2 concurrent
mappers per node.
Thanks for this - as I said, I was playing around with the number of
mappers in an effort to see if the backup job would proceed any better
- it sounds like this is a bad idea. Thanks and noted.
As far as robustness of HBase w.r.t. missing blocks, there's not much
HBase can do. HBase uses HDFS as it's persistent storage. If the
blocks are not available, then your data is simply not available.
This would be the same for any database on any filesystem, if the
filesystem says the file doesn't exist, the database can't do much.
Sure. This makes sense. I guess I'm wondering, in the event that I
have lost some blocks, is my entire HBase hosed or can it tolerate the
loss and remove the corrupt rows from the table(s) or do I need to
repopulate my data again (I plan to do this in the longer term anyway,
but wondering do I need to accelerate that plan).
It does seem that your issues are from too much load on insufficient
resources. And I would not expect 0.20 to behave better in that
respect, as it now uses the CMS garbage collector and is a bit more
resource hungry than it's predecessors (more resources, but far
better performance).
Thanks also for this clarification - it sounds like 0.20 is still
worth pursuing for the other reason, but as you note, it's not going
to expand my capacity.
Many thanks for the detailed reply, lots to reflect on.
-stephen