On Fri, Oct 23, 2009 at 6:56 PM, Jun Li <jltz922...@gmail.com> wrote:

> ...
> I then set up an Hbase table with a row key of 48 bytes, and a column
> that holds about 20 Bytes data.  For a single client, I was able to
> get in average, the write of 0.6 milliseconds per row (random write),
> and the read of 0.4 milliseconds per row (random read).
>
> Then  I  had each machine in the cluster to launch 1, or 2,  or 3
> client test applications, with each client test application read/write
> 100000 rows for each test run, for throughput testing.  From my
> measurement results, I found that the random write will have best
> measured performance when each machine having 2 clients (totally
> 2*13=26 clients in the cluster), with 8500 rows/second; and the random
> read will have almost the same throughput for 2 or 3 clients, with
> 35000 rows/second.
> ...
>

Single server gives you 0.6ms to random-write and 0.4ms to random read?
Thats not bad.  Random-write is slower because its appending the WAL.  The
random-read is coming from cache otherwise I'd expect it taking milliseconds
(disk-seek).

8500rows/second is across whole cluster?  If it took 1ms per random-write,
you should be doing about twice this rate over the cluster (if your writes
are not batched): 1ms * 13 * 1000.

What kinda numbers are you looking for Jun?



> So the question that I have is that, following the original Google’s
> BigTable paper, should Random Write be always much faster than Random
> Read?


Random write should be faster than random read unless a good portion of your
dataset fits into cache (random read involves disk seek if no cache hit;
random write is appending to a file... which usually would not involve disk
seek).



>   If that is the case, what are the tunable parameters in terms
> of HBase setup that I can explore to improve the Random Write speed.
>
>
It looks like batching won't help in your case because no locality in your
keying.

St.Ack

Reply via email to