Sounds like a very interesting experiment! We'd be happy to help you tweak and optimize your HBase installation to achieve peak performance.

But first off, 0.19.0 is an outdated and unsupported version. At the least, please upgrade to 0.19.3. However, the upcoming 0.20.0 release includes a large number of performance-focused improvements. There will be an RC3 released this week. It is known to be quite stable so I would strongly recommend that you run tests on RC3. A final release is expected shortly thereafter.

Thanks, and keep us updated!

JG

Adam Silberstein wrote:
Hi,

For those interested, I want to tell the HBase community more about the
experimental results Raghu Ramakrishnan presented at VLDB last week, and
where we in Yahoo! Research are going from here.

First, our results from all systems were very preliminary, and Raghu
emphasized that.  We deployed each with essentially their out-of-the-box
configurations and ran a series of simple experiments.
I'll briefly outline the experiments, and please let me know if you want
more details.  We deployed HBase 0.19.0 with 1 master server and 6
region servers, with 6 GB heap space to each machine.  We loaded
120,000,000 1KB records, where each record is essentially one column of
just random bytes.  We then ran a series of experiments where we set a
target read/update ratio, and then measured actual throughput and per-op
latency.  In our setting, an update is actually a read+write, and the
update overwrites the entire record.  We used two ratios: 50% read/50%
update, and 95% read/5% update.  The target throughputs (across 6 region
servers) were 600,1200, 2400, 3600, 4800, 6000.  Generally, we used 50
parallel clients to reach the target throughput, but also went up to 100
or 200 clients if necessary.  We ran each experiment targeted for 30
minutes, but allowed it to run longer if the system could not meet the
target throughput.
We are now taking on the effort of expanding our benchmark beyond these
2 simple workloads to capture the key use cases for key-value stores.
One of our goals is to release the benchmark and the harness for running
it.  Another of our goals is to run our expanded benchmark over against
the systems we have already, and perhaps more.  While we plan to
eventually publish our results in a conference or journal, I want to
emphasize we will first circulate of draft of our findings to give the
various communities a chance to comment, make suggestions, tell us if
our results look way off, etc.
Thanks for your interest!

-Adam


Reply via email to