[Lucene-hadoop Wiki] Update of "Hbase/PerformanceEvaluation" by stack

Apache Wiki Fri, 08 Jun 2007 10:58:18 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for 
change notification.


The following page has been changed by stack:
http://wiki.apache.org/lucene-hadoop/Hbase/PerformanceEvaluation

The comment on the change is:
First cut at description of the performance evaluation scripts

New page:
= Testing HBase Performance and Scalability =

[https://issues.apache.org/jira/browse/HADOOP-1476 HADOOP-1476] adds to HBase 
{{{src/test}}} the script {{{org.apache.hadoop.hbase.PerformanceEvaluation}}}.  
It runs the tests described in ''Performance Evaluation'', Section 7 of the 
[http://labs.google.com/papers/bigtable.html BigTable paper].  See the citation 
for test descriptions.  They will not be described below. The script is useful 
evaluating HBase performance and how well it scales as we add region servers.

Here is the current usage for the {{{PerformanceEvaluation}}} script:

{{{
[EMAIL PROTECTED] ~]$ ./hadoop-trunk/src/contrib/hbase/bin/hbase 
org.apache.hadoop.hbase.PerformanceEvaluation
Usage: java org.apache.hadoop.hbase.PerformanceEvaluation[--master=host:port] 
[--miniCluster] <command> <nclients>

Options:
 master          Specify host and port of HBase cluster master. If not present,
                 address is read from configuration
 miniCluster     Run the test on an HBaseMiniCluster

Command:
 randomRead      Run random read test
 randomReadMem   Run random read test where table is in memory
 randomWrite     Run random write test
 sequentialRead  Run sequential read test
 sequentialWrite Run sequential write test
 scan            Run scan test

Args:
 nclients        Integer. Required. Total number of clients (and HRegionServers)
                 running: 1 <= value <= 500
Examples:
 To run a single evaluation client:
 $ bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1
}}}

If you pass nclients > 1, {{{PerformanceEvaluation}}} starts up a mapreduce job 
in which each map runs a single loading client instance.

To run the {{{PerformanceEvaluation}}} script, compile the HBase test classes:

{{{
$ cd ${HBASE_HOME}
$ ant compile-test
}}}

The above ant target compiles all test classes into 
{{{${HADOOP_HOME}/build/contrib/hbase/test}}}.  It also generates 
{{{${HADOOP_HOME}/build/contrib/hbase/hadoop-hbase-test.jar}}}.  The latter jar 
includes all HBase test and src classes and has 
{{{org.apache.hadoop.hbase.PerformanceEvaluation}}} as its {{{Main-Class}}}.  
Use the test jar running {{{PerformanceEvaluation}}} on a hadoop cluster.

Here is how to run a single-client {{{PerformanceEvaluation}}} 
''sequentialWrite'' test:

{{{$ ${HADOOP_HOME}/src/contrib/hbase/bin/hbase 
org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1
}}}

Here is how you would run the same on hadoop cluster:

{{{$ ${HADOOP_HOME}/bin/hadoop jar 
${HADOOP_HOME}/build/contrib/hbase/hadoop-hbase-test.jar sequentialWrite 1
}}}

For the latter, you will likely have to copy your hbase configurations -- e.g. 
your {{{${HBASE_HOME}/conf/hbase*.xml}}} files -- to {{{${HADOOP_HOME}/conf}}} 
and make sure they are replicated across the cluster so your hbase 
configurations can be found by the running mapreduce job (in particular, 
clients need to know the address of the HBase master).

Note, the mapreduce mode of the testing script works a little different from 
single client mode.  It does not delete the test table at the end of each run 
as is done when the script runs in single client mode.  Nor does it pre-run the 
'''sequentialWrite''' test before its runs the '''sequentialRead''' test (the 
table needs to be populated with data first before the sequentialRead can run). 
 For the mapreduce version, the onus is on the operator to organize the correct 
order in which to run the jobs.  To delete a table, use the hbase client.


{{{$ ${HBASE_HOME}/bin/hbase ciient listTables
$ ${HBASE_HOME}/bin/hbase ciient deleteTable TestTable
}}}

Some first figures in advance of any profiling of the current state of the 
HBase code (on Fri Jun 8 2007) would seem to indicate that HBase runs at about 
an order-of-magnitude slower than whats reported in the BigTable paper running 
on similiar hardware (more on this to follow).

[Lucene-hadoop Wiki] Update of "Hbase/PerformanceEvaluation" by stack

Reply via email to