Thanks for the clarification. I changed the ROW_LENGTH as you suggested and 
used SequenceWrite + randomRead combination to benchmark. Initial result was 
impressive, eventhough I would like to have the last column improved.

randomRead
=========
                -nclients(--rows)               5 (10000)           50(10000)   
             100(10000)                 1000 (10000)
totalrows                       
800k                                             0.4ms                 3.5ms    
                     6.5ms                            55ms
2.3m                                             0.45ms               3.5ms     
                    6.6ms                            56ms

 Only change in the config was that the handler count increased to 1000. I 
think there will be some parameters which can be tweaked to improve this 
further?

My goal is get test it for 10million rows with this  box. For some reason the 
sequenceWrite job with 5000000row + 2 clients failed,with the following 
exception:-

09/08/19 00:34:07 INFO mapred.LocalJobRunner: 2000000/2050000/2500000
09/08/19 00:50:38 WARN mapred.LocalJobRunner: job_local_0001
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact 
region server Some server for region , row '0002076131', but failed after 11 
attempts.
Exceptions:
java.io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.
java..io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.

        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocationForRowWithRetries(HConnectionManager.java:995)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1064)
        at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:584)
        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:450)
        at 
org.evaluation.hbase.PerformanceEvaluation$SequentialWriteTest.testRow(PerformanceEvaluation.java:736)
        at 
org.evaluation.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:571)
        at 
org.evaluation.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:804)
        at 
org.evaluation.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:350)
        at 
org.evaluation.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:326)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:518)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:303)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176)


 

From region server log:-
2009-08-19 00:47:22,740 WARN org.apache.hadoop.hbase.regionserver.Store: Not in 
setorg.apache.hadoop.hbase.regionserver.storescan...@1e458ae5
2009-08-19 00:48:22,741 WARN org.apache.hadoop.hbase.regionserver.Store: Not in 
setorg.apache.hadoop.hbase.regionserver.storescan...@4b28029c
2009-08-19 00:49:22,743 WARN org.apache.hadoop.hbase.regionserver.Store: Not in 
setorg.apache.hadoop.hbase.regionserver.storescan...@33fb11e0
2009-08-19 00:50:22,745 WARN org.apache.hadoop.hbase.regionserver.Store: Not in 
setorg.apache.hadoop.hbase.regionserver.storescan...@6ceccc3b
2009-08-19 00:51:22,746 WARN org.apache.hadoop.hbase.regionserver.Store: Not in 
setorg.apache.hadoop.hbase.regionserver.storescan...@44a3b5b4

Thanks,
Murali Krishna




________________________________
From: Jonathan Gray <[email protected]>
To: [email protected]
Sent: Wednesday, 19 August, 2009 12:26:55 AM
Subject: Re: HBase-0.20.0 randomRead

With all that memory, you're likely seeing such good performance because 
of filesystem caching.  As you say, 2ms is extraordinarily fast for a 
disk read, but since your rows are relatively small, you are loading up 
all that data into memory (not only the fs cache, but also hbase's block 
cache which makes it even faster).

JG

Jean-Daniel Cryans wrote:
> Well it seems there's something wrong with the way you modified PE. It
> is not really testing your table unless the row keys are built the
> same way as TestTable is, to me it seems that you are testing on only
> 20000 rows so caching is easy. A better test would just be to use PE
> the way it currently is but with ROW_LENGTH = 4k.
> 
> WRT Jetty, make sure you optimized it with
> http://jetty.mortbay.org/jetty5/doc/optimization.html
> 
> J-D
> 
> On Tue, Aug 18, 2009 at 12:08 PM, Murali Krishna.
> P<[email protected]> wrote:
>> Ahh, mistake, I just took it as seconds.
>>
>> Now I wonder whether it can really do that fast ?? wont it take atleast 2ms 
>> for disk read? ( I have given 8G heapspace for RegionServer, is it caching 
>> so much?). Has anyone seen these kind of numbers ?
>>
>>
>> Actually, my initial problem was that I have a jetty infront of this hbase 
>> to serve this 4k value and when bench marked, it took 200+milliseconds for 
>> each record with 100 clients. That is why decided to benchmark without jetty 
>> first.
>>
>> Thanks,
>> Murali Krishna
>>
>>
>>
>>
>> ________________________________
>> From: Jean-Daniel Cryans <[email protected]>
>> To: [email protected]
>> Sent: Tuesday, 18 August, 2009 9:13:40 PM
>> Subject: Re: HBase-0.20.0 randomRead
>>
>> Murali,
>>
>> I'm not reading the same thing as you.
>>
>> client-0 Finished randomRead in 2867ms at offset 0 for 10000 rows
>>
>> That means 2867 / 10000 = 0.2867ms per row. It's kinda fast.
>>
>> J-D
>>
>> On Tue, Aug 18, 2009 at 11:35 AM, Murali Krishna.
>> P<[email protected]> wrote:
>>> Hi all,
>>>  (Saw a related thread on performance, but starting a different one because 
>>> my setup is slightly different).
>>>
>>> I have an one node setup with hbase-0..20(alpha). It has around 11million 
>>> rows with ~250 regions. Each row with ~20 bytes sized key and ~4k sized 
>>> value.
>>> Since my primary concern is randomRead, modified the performanceEvaluation 
>>> code to read from this particular table. The randomRead test gave following 
>>> result.
>>>
>>> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: client-1 Finished 
>>> randomRead in 2813ms at offset 10000 for 10000 rows
>>> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: Finished 1 in 2813ms 
>>> writing 10000 rows
>>> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: client-0 Finished 
>>> randomRead in 2867ms at offset 0 for 10000 rows
>>> 09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: Finished 0 in 2867ms 
>>> writing 10000 rows
>>>
>>>
>>> So, looks like it is taking around 280ms per record. Looking at the latest 
>>> hbase performance claims, I was expecting it below 10ms. Am  I doing 
>>> something basically wrong, since such a hiuge difference :( ? Please help 
>>> me fix the latency.
>>>
>>> The machine config is:
>>> Processors:    2 x Xeon L5420 2.50GHz (8 cores)
>>> Memory:        13.7GB
>>> 12 Disks of 1TB each.
>>>
>>> Let me know if you need anymore details
>>>
>>> Thanks,
>>> Murali Krishna
> 

Reply via email to