Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The following page has been changed by stack: http://wiki.apache.org/hadoop/Hbase/FAQ The comment on the change is: Add how to access hbase from non-java languages ------------------------------------------------------------------------------ 1. [#1 Can someone give an example of basic API-usage going against hbase?] 1. [#2 What other hbase-like applications are there out there?] - 1. [#3 Can I fix O!utOfMemoryExceptions in hbase?] + 1. [#3 Can I fix OutOfMemoryExceptions in hbase?] 1. [#4 How do I enable hbase DEBUG-level logging?] 1. [#5 Why do I see "java.io.IOException...(Too many open files)" in my logs?] 1. [#6 What can I do to improve hbase performance?] + 1. [#7 How do I access Hbase from my Ruby/Python/Perl/PHP/etc. application?] == Answers == @@ -55, +56 @@ * [wiki:Hbase/PNUTS PNUTS], a Platform for Nimble Universal Table Storage, being developed internally at Yahoo! * [http://www.amazon.com/gp/browse.html?node=342335011 Amazon SimpleDB] is a web service for running queries on structured data in real time. - '''3. [[Anchor(3)]] Can I fix O!utOfMemoryExceptions in hbase?''' + '''3. [[Anchor(3)]] Can I fix OutOfMemoryExceptions in hbase?''' Out-of-the-box, hbase uses the default JVM heap size. Set the ''HBASE_HEAPSIZE'' environment variable in ''${HBASE_HOME}/conf/hbase-env.sh'' if your install needs to run with a larger heap. ''HBASE_HEAPSIZE'' is like ''HADOOP_HEAPSIZE'' in that its value is the desired heap size in MB. The surrounding '-Xmx' and 'm' needed to make up the maximum heap size java option are added by the hbase start script (See how ''HBASE_HEAPSIZE'' is used in the ''${HBASE_HOME}/bin/hbase'' script for clarification). @@ -65, +66 @@ '''5. [[Anchor(5)]] Why do I see "java.io.IOException...(Too many open files)" in my logs?''' - Running an Hbase loaded w/ more than a few regions, its possible to blow past the environment file handle limit for the user running the process. Running out of file handles is like an OOME, things start to fail in strange ways. To up the users' file handles, edit '''/etc/security/limits.conf''' on all nodes and restart your cluster. + Currently Hbase is a file handle glutton. Running an Hbase loaded w/ more than a few regions, its possible to blow past the common 1024 default file handle limit for the user running the process. Running out of file handles is like an OOME, things start to fail in strange ways. To up the users' file handles, edit '''/etc/security/limits.conf''' on all nodes and restart your cluster. '''6. [[Anchor(6)]] What can I do to improve hbase performance?''' - To improve random-read performance, if you can, try making the hdfs block size smaller (as is suggested in the bigtable paper). By default its 64MB. Try setting it to 8MB. On every random read, hbase has to fetch from hdfs the blocks that contain the wanted row. If your rows are small, much smaller than the hdfs block size, then we'll be fetching a lot of data only to discard the bulk. Meantime the big block fetches and processing consume CPU, network, etc. in the datanodes and hbase client. - - Another configuration that can help with random reads at some cost in memory is making the '''hbase.io.index.interval''' smaller. By default when hbase writes store files, it adds an entry to the mapfile index on every 32nd addition (For hadoop, default is every 128th addition). Adding entries more frequently -- every 16th or every 8th -- will make it so there is less seeking around looking for the wanted entry but at the cost of a hbase carrying a larger index (Indices are read into memory on mapfile open; by default there are one to five or so mapfiles per column family per region loaded into a regionserver). + A configuration that can help with random reads at some cost in memory is making the '''hbase.io.index.interval''' smaller. By default when hbase writes store files, it adds an entry to the mapfile index on every 32nd addition (For hadoop, default is every 128th addition). Adding entries more frequently -- every 16th or every 8th -- will make it so there is less seeking around looking for the wanted entry but at the cost of a hbase carrying a larger index (Indices are read into memory on mapfile open; by default there are one to five or so mapfiles per column family per region loaded into a regionserver). Some basic tests making the '''io.bytes.per.checksum''' larger -- changing it from checksum-checking every 4096 bytes instead of every 512 bytes -- seem to have no discernible effect on performance. + + '''7. [[Anchor(7)]] How do I access Hbase from my Ruby/Python/Perl/PHP/etc. application?''' + + * [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/ Description of how to launch a thrift service, client bindings and examples in ruby and C++] for connecting to Hbase + * [http://wiki.apache.org/hadoop/Hbase/HbaseRest REST Interface] to Hbase + * There is also a patch in [https://issues.apache.org/jira/browse/HADOOP-2171 HADOOP-2171] that will put up a server to parse and process [http://wiki.apache.org/hadoop/Hbase/HbaseShell HQL] +