Genady wrote:
Thanks for your answer Jean-Adrien,
I've verified a setting the timeout parameter to the default value and
xceivers to original 3000(too small for our env regions number), after a
while HBase indeed succeeded to start( with tons of exceeds xceiver limit
exceptions), nevertheless performance of the MR task remain too slow, as
Jean-Daniel suggested( in previous post) probably as result of too much
regions per region server, so we going to increase file size and rebuild
data.
Leaving the default means that less resources are concurrently occupied
in the datanode -- sockets and threads of under utilized files have been
let go (you'll see the timeout exception in your log when the let-go
happens). Resources are maximally used at startup when all the region
opens are happening. You might even consider setting down the default
timeout from 8 minutes to something like 2 or 4 if you run into max
xceivers again.
Tell us more about your slowness before you go about changing region
sizes. How is it slow? Is it lookups against the .META. table? Try
some yourself in the shell to see how well these are doing. See if you
can narrow why its slow. Are you swapping (as J-D asked earlier). How
long does the MR job run? Is it slow over its whole life? Are your
tasks short? If so, you might make them run longer so you better
exploit the cache of region locations built by a client. How many
mappers do you have running concurrently? If many, try cutting them in
half.
Regarding your question about JVM errors, according to the following post it
seems that in case of the following OOM error("java.lang.OutOfMemoryError:
unable to create new native thread"), increasing a heap size will not
prevent OOM problem:
http://www.egilh.com/blog/archive/2006/06/09/2811.aspx
Yes, its a complaint about resources outside of the JVM heap. Upping
heap size won't help. You could try playing with the -Xss -- thread
stack size -- downing it from whatever the java6 default is to see if
that helps.
St.Ack
Anyway after setting Hadoop heap size to 1 or !.5GB the error didn't come
back.
Gennady
probably as result of increasing xceivers thread number,
-----Original Message-----
From: Jean-Adrien [mailto:[email protected]]
Sent: Wednesday, January 28, 2009 6:03 PM
To: [email protected]
Subject: Re: Hbase 0.19 failed to start: exceeds the limit of concurrent
xcievers 3000
Hello Genady,
You might be interested in one of our previous post about this topic:
http://www.nabble.com/Datanode-Xceivers-td21372227.html
if you are using hadoop / HBase 0.19 you should leave the timeout
dfs.datanode.socket.write.timeout to its original default value 480000 (8
min)
Stack tested this, and the effect is that the Xcievers threads of hadoop
eventually ends with errors, but the errors does not affect HBase stability
since HADOOP-3831 have been fixed for 0.19
And it should decrease the number of threads, and therefore the memory
needed for the jvm process.
Personally, I haven't updated to 0.19 yet, therefore I haven't tested this
for now, but I can't wait...
One think I don't understand in your problem is that the memory allocated
per thread in the jvm is not the heap, but the stack. Anyway the global
process virtual memory allocated should decrease (which allow you to
increase the heap.)
For your information I run 3 region servers with a 512Mb heap and about 150
regions each. I see my first OOM these days.
About Xcievers I see peaks of 1300 Xcievers during HBase startup with 2
datanodes, and a replication factor of 2; but if I enable the timeout I
guess about 800 should be enough.
Genady wrote:
Hi,
It seems that HBase 0.19 on Hadoop 0.19 fail to start because of exceeding
limit of concurrent xceivers( in hadoop datanode logs), which is currently
3000, setting more than 3000 xceivers is causing JVM out of memory
exception, is there is something wrong with configuration parameters of
cluster( three nodes, 430 regions,Hadoop heap size is default - 1GB)?
Additional parameters in hbase configuration are:
dfs.datanode.handler.count = 6,
dfs.datanode.socket.write.timeout=0
java.io.IOException: xceiverCount 3001 exceeds the limit of concurrent
xcievers 3000
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:87)
at java.lang.Thread.run(Thread.java:619)
Any help is very appreciated,
Genady