On Fri, Oct 23, 2009 at 6:56 PM, Jun Li <jltz922...@gmail.com> wrote:
> ... > I then set up an Hbase table with a row key of 48 bytes, and a column > that holds about 20 Bytes data. For a single client, I was able to > get in average, the write of 0.6 milliseconds per row (random write), > and the read of 0.4 milliseconds per row (random read). > > Then I had each machine in the cluster to launch 1, or 2, or 3 > client test applications, with each client test application read/write > 100000 rows for each test run, for throughput testing. From my > measurement results, I found that the random write will have best > measured performance when each machine having 2 clients (totally > 2*13=26 clients in the cluster), with 8500 rows/second; and the random > read will have almost the same throughput for 2 or 3 clients, with > 35000 rows/second. > ... > Single server gives you 0.6ms to random-write and 0.4ms to random read? Thats not bad. Random-write is slower because its appending the WAL. The random-read is coming from cache otherwise I'd expect it taking milliseconds (disk-seek). 8500rows/second is across whole cluster? If it took 1ms per random-write, you should be doing about twice this rate over the cluster (if your writes are not batched): 1ms * 13 * 1000. What kinda numbers are you looking for Jun? > So the question that I have is that, following the original Google’s > BigTable paper, should Random Write be always much faster than Random > Read? Random write should be faster than random read unless a good portion of your dataset fits into cache (random read involves disk seek if no cache hit; random write is appending to a file... which usually would not involve disk seek). > If that is the case, what are the tunable parameters in terms > of HBase setup that I can explore to improve the Random Write speed. > > It looks like batching won't help in your case because no locality in your keying. St.Ack