Jean-Adrien wrote:
Hi everybody,

I saw that you put some advises concerning the Hadoop settings when one has
a problem of max xceivers reached, in the troubleshooting section of the
wiki.
Yes. Thanks to your research Jean-Adrien. And have you seen the addition made by Andrew Purtell suggesting upping the datanode listeners?

About this topic, I recently post a question in hadoop-core user mailing
list about their 'xcievers' thread behavior, since I still had to increase
their amount as my HBase table grows, in order to avoid to reach the limit
at startup time. And therefore my jvm use a lot of virtual memory (actually
with 500MB for the heap, 1100 threads allocate 2GB virtual memory). This
evenutally yields to swap and failure.

Yeah. That makes sense (Have you tried setting thread stack size down -- -Xss -- so less outside-of-the-heap memory is used?)

Here is the link to my post. With a graph showing the number of thread the
datanode creates when I start hbase.
http://www.nabble.com/xceiverCount-limit-reason-td21349807.html#a21352818

You can see that all threads are created at HBase startup time, and, if the
timeout ( dfs.datanode.socket.write.timeout
) is set, they all ends with a timeout failure.

The question for HBase is, why are the connection with hadoop kept open (and
the thread as well) ? Does it happen only in my case ?

No. Happens for everyone. HBase keeps open its connection to every StoreFile. We do this to avoid paying the open cost every time a file is accessed primarily to improve random-access performance. StoreFiles in hbase are based on Hadoop MapFile. MapFile is two SequenceFiles -- data and index. An open would require at least a trip to namenode per SequenceFile to learn blocks that make up a file, then trip to the holding datanodes first to read in index if a random-access and then to the target block once its location was found. Instead, per StoreFile, on open, we read in the index (and then close the index file) and then keep up the DFSClient connection to the datafile so block locations are kept over in the hbase regionserver.

Keeping open a permanent connection to the store file costs us. Users will trip over the 'too many open files...' pretty early on unless they up their ulimit for file descriptors. Also, keeping the index in memory as we currently do is the main cause of heap usage -- particularly if cells are small. Then there is the cost over in HDFS which is what you are bringing up here.

lava has the same problem. But I don't think everybody does,
since the cluster could not run without disabling the timeout parameter
dfs.datanode.socket.write.timeout

Anybody made those observations ?

I haven't been paying attention of late. Thanks for bringing it up Jean-Adrien. Lets try and figure it (I 'thought' that the timer over on the datanode would close idle sockets but that subsequent accesses would revive the connection but that doesn't seem to be the case going by your hadoop posting).

St.Ack

Reply via email to