[Hadoop Wiki] Trivial Update of "Hbase/FAQ" by stack

Apache Wiki Thu, 22 Jan 2009 10:59:30 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The following page has been changed by stack:
http://wiki.apache.org/hadoop/Hbase/FAQ

------------------------------------------------------------------------------
  
  The math runs roughly as follows: Per column family, there is at least one 
mapfile and possibly up to 5 or 6 if a region is under load (lets say 3 per 
column family on average).  Multiply by the number of regions per region 
server.  So, for example, say you have a schema of 3 column familes per region 
and that you have 100 regions per regionserver, the JVM will open 3 * 3 * 100 
mapfiles -- 900 file descriptors not counting open jar files, conf files, etc 
(Run 'lsof -p REGIONSERVER_PID' to see for sure).
  
+ Or you may be running into 
[http://pero.blogs.aprilmayjune.org/2009/01/22/hadoop-and-linux-kernel-2627-epoll-limits/
 kernel limits]?
+ 
  '''7. [[Anchor(7)]] What can I do to improve hbase performance?'''
  
  A configuration that can help with random reads at some cost in memory is 
making the '''hbase.io.index.interval''' smaller.  By default when hbase writes 
store files, it adds an entry to the mapfile index on every 32nd addition (For 
hadoop, default is every 128th addition).  Adding entries more frequently -- 
every 16th or every 8th -- will make it so there is less seeking around looking 
for the wanted entry but at the cost of a hbase carrying a larger index 
(Indices are read into memory on mapfile open; by default there are one to five 
or so mapfiles per column family per region loaded into a regionserver).

[Hadoop Wiki] Trivial Update of "Hbase/FAQ" by stack

Reply via email to