Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.
The following page has been changed by stack: http://wiki.apache.org/lucene-hadoop/Hbase/PerformanceEvaluation The comment on the change is: First cut at description of the performance evaluation scripts New page: = Testing HBase Performance and Scalability = [https://issues.apache.org/jira/browse/HADOOP-1476 HADOOP-1476] adds to HBase {{{src/test}}} the script {{{org.apache.hadoop.hbase.PerformanceEvaluation}}}. It runs the tests described in ''Performance Evaluation'', Section 7 of the [http://labs.google.com/papers/bigtable.html BigTable paper]. See the citation for test descriptions. They will not be described below. The script is useful evaluating HBase performance and how well it scales as we add region servers. Here is the current usage for the {{{PerformanceEvaluation}}} script: {{{ [EMAIL PROTECTED] ~]$ ./hadoop-trunk/src/contrib/hbase/bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation Usage: java org.apache.hadoop.hbase.PerformanceEvaluation[--master=host:port] [--miniCluster] <command> <nclients> Options: master Specify host and port of HBase cluster master. If not present, address is read from configuration miniCluster Run the test on an HBaseMiniCluster Command: randomRead Run random read test randomReadMem Run random read test where table is in memory randomWrite Run random write test sequentialRead Run sequential read test sequentialWrite Run sequential write test scan Run scan test Args: nclients Integer. Required. Total number of clients (and HRegionServers) running: 1 <= value <= 500 Examples: To run a single evaluation client: $ bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1 }}} If you pass nclients > 1, {{{PerformanceEvaluation}}} starts up a mapreduce job in which each map runs a single loading client instance. To run the {{{PerformanceEvaluation}}} script, compile the HBase test classes: {{{ $ cd ${HBASE_HOME} $ ant compile-test }}} The above ant target compiles all test classes into {{{${HADOOP_HOME}/build/contrib/hbase/test}}}. It also generates {{{${HADOOP_HOME}/build/contrib/hbase/hadoop-hbase-test.jar}}}. The latter jar includes all HBase test and src classes and has {{{org.apache.hadoop.hbase.PerformanceEvaluation}}} as its {{{Main-Class}}}. Use the test jar running {{{PerformanceEvaluation}}} on a hadoop cluster. Here is how to run a single-client {{{PerformanceEvaluation}}} ''sequentialWrite'' test: {{{$ ${HADOOP_HOME}/src/contrib/hbase/bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1 }}} Here is how you would run the same on hadoop cluster: {{{$ ${HADOOP_HOME}/bin/hadoop jar ${HADOOP_HOME}/build/contrib/hbase/hadoop-hbase-test.jar sequentialWrite 1 }}} For the latter, you will likely have to copy your hbase configurations -- e.g. your {{{${HBASE_HOME}/conf/hbase*.xml}}} files -- to {{{${HADOOP_HOME}/conf}}} and make sure they are replicated across the cluster so your hbase configurations can be found by the running mapreduce job (in particular, clients need to know the address of the HBase master). Note, the mapreduce mode of the testing script works a little different from single client mode. It does not delete the test table at the end of each run as is done when the script runs in single client mode. Nor does it pre-run the '''sequentialWrite''' test before its runs the '''sequentialRead''' test (the table needs to be populated with data first before the sequentialRead can run). For the mapreduce version, the onus is on the operator to organize the correct order in which to run the jobs. To delete a table, use the hbase client. {{{$ ${HBASE_HOME}/bin/hbase ciient listTables $ ${HBASE_HOME}/bin/hbase ciient deleteTable TestTable }}} Some first figures in advance of any profiling of the current state of the HBase code (on Fri Jun 8 2007) would seem to indicate that HBase runs at about an order-of-magnitude slower than whats reported in the BigTable paper running on similiar hardware (more on this to follow).