Hi, For those interested, I want to tell the HBase community more about the experimental results Raghu Ramakrishnan presented at VLDB last week, and where we in Yahoo! Research are going from here.
First, our results from all systems were very preliminary, and Raghu emphasized that. We deployed each with essentially their out-of-the-box configurations and ran a series of simple experiments. I'll briefly outline the experiments, and please let me know if you want more details. We deployed HBase 0.19.0 with 1 master server and 6 region servers, with 6 GB heap space to each machine. We loaded 120,000,000 1KB records, where each record is essentially one column of just random bytes. We then ran a series of experiments where we set a target read/update ratio, and then measured actual throughput and per-op latency. In our setting, an update is actually a read+write, and the update overwrites the entire record. We used two ratios: 50% read/50% update, and 95% read/5% update. The target throughputs (across 6 region servers) were 600,1200, 2400, 3600, 4800, 6000. Generally, we used 50 parallel clients to reach the target throughput, but also went up to 100 or 200 clients if necessary. We ran each experiment targeted for 30 minutes, but allowed it to run longer if the system could not meet the target throughput. We are now taking on the effort of expanding our benchmark beyond these 2 simple workloads to capture the key use cases for key-value stores. One of our goals is to release the benchmark and the harness for running it. Another of our goals is to run our expanded benchmark over against the systems we have already, and perhaps more. While we plan to eventually publish our results in a conference or journal, I want to emphasize we will first circulate of draft of our findings to give the various communities a chance to comment, make suggestions, tell us if our results look way off, etc. Thanks for your interest! -Adam
