Hi J-D and Michael,

Following your insightful suggestions, I will try out the setting of
hfile.block.cache.size to be 0 and also manage to get the region
servers to hold much more regions.  In order to get to 40-50 regions
per server, I would need a huge data set to fill my cluster. According
to your experience,  I wonder what is  the nature of such data that
people have experienced to hold 40-50 region servers?  In particular,
the size of the row? Also, have people ever try to reduce the
threshold of splitting a region,  from the default (64MB, I think) to
smaller size, so that the regions can be splitted faster and thus gain
better concurrency?

Just to answer Michael's question, regarding my performance
measurement that I reported, 0.6ms  is the latency number. I measured
it by having only one single client to be launched in the entire
cluster to do read/write.

But for throughput measurement, I used 2 client test applications on
every machine. Thus, I had 2*13=26 client application instances
running in the cluster to concurrently do read/write to the HBase
cluster. For each client, to finish the same read/write task, the
averaged latency will climb up to about 3 ms (because it has to
compete  with other clients). That is, roughly, (1/0.003)*2*13 = 8600
calls/sec, for the entire cluster. But to be more accurate, I
collected all the elapsed time spent for all the clients to finish
their work, and it took 5 minutes and 3 seconds for a particular
round, which
translates to:

      2*100000*13/(5*60+3)=8580 calls/sec.

The two numbers agreed well, because all clients are able to be
launched almost simultaneously, and they finished their job at almost
the same time as well.

In terms of the next round of performance testing, I could scale my
cluster to a 16-machine cluster with 8 cores and 32GB RAM per machine.
>From your experience, I am curious about how other people have done or
observed, in terms of the linear scalability of the current
implementation, the HBase0.20.0.


Regards,

Jun


On Sun, Oct 25, 2009 at 10:05 AM, stack <st...@duboce.net> wrote:
> On Fri, Oct 23, 2009 at 6:56 PM, Jun Li <jltz922...@gmail.com> wrote:
>
>> ...
>> I then set up an Hbase table with a row key of 48 bytes, and a column
>> that holds about 20 Bytes data.  For a single client, I was able to
>> get in average, the write of 0.6 milliseconds per row (random write),
>> and the read of 0.4 milliseconds per row (random read).
>>
>> Then  I  had each machine in the cluster to launch 1, or 2,  or 3
>> client test applications, with each client test application read/write
>> 100000 rows for each test run, for throughput testing.  From my
>> measurement results, I found that the random write will have best
>> measured performance when each machine having 2 clients (totally
>> 2*13=26 clients in the cluster), with 8500 rows/second; and the random
>> read will have almost the same throughput for 2 or 3 clients, with
>> 35000 rows/second.
>> ...
>>
>
> Single server gives you 0.6ms to random-write and 0.4ms to random read?
> Thats not bad.  Random-write is slower because its appending the WAL.  The
> random-read is coming from cache otherwise I'd expect it taking milliseconds
> (disk-seek).
>
> 8500rows/second is across whole cluster?  If it took 1ms per random-write,
> you should be doing about twice this rate over the cluster (if your writes
> are not batched): 1ms * 13 * 1000.
>
> What kinda numbers are you looking for Jun?
>
>
>
>> So the question that I have is that, following the original Google’s
>> BigTable paper, should Random Write be always much faster than Random
>> Read?
>
>
> Random write should be faster than random read unless a good portion of your
> dataset fits into cache (random read involves disk seek if no cache hit;
> random write is appending to a file... which usually would not involve disk
> seek).
>
>
>
>>   If that is the case, what are the tunable parameters in terms
>> of HBase setup that I can explore to improve the Random Write speed.
>>
>>
> It looks like batching won't help in your case because no locality in your
> keying.
>
> St.Ack
>

Reply via email to