Re: HBase-0.20.0 randomRead

Jonathan Gray Wed, 19 Aug 2009 09:51:35 -0700

Murali,

Which version of HBase are you running?

There was a fix that was just committed a few days ago for a bug thatmanifested as null/empty HRI.

It has been fixed in RC2, so I recommend upgrading to that and tryingyour upload again.


JG

Murali Krishna. P wrote:

Thanks for the clarification. I changed the ROW_LENGTH as you suggested and 
used SequenceWrite + randomRead combination to benchmark. Initial result was 
impressive, eventhough I would like to have the last column improved.

randomRead
=========
                -nclients(--rows)               5 (10000)           50(10000)   
             100(10000)                 1000 (10000)

totalrows800k 0.4ms 3.5ms 6.5ms 55ms

2.3m                                             0.45ms               3.5ms     
                    6.6ms                            56ms

 Only change in the config was that the handler count increased to 1000. I 
think there will be some parameters which can be tweaked to improve this 
further?

My goal is get test it for 10million rows with this  box. For some reason the 
sequenceWrite job with 5000000row + 2 clients failed,with the following 
exception:-

09/08/19 00:34:07 INFO mapred.LocalJobRunner: 2000000/2050000/2500000
09/08/19 00:50:38 WARN mapred.LocalJobRunner: job_local_0001
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact 
region server Some server for region , row '0002076131', but failed after 11 
attempts.
Exceptions:
java.io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.
java..io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.

        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocationForRowWithRetries(HConnectionManager.java:995)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1064)
        at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:584)
        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:450)
        at 
org.evaluation.hbase.PerformanceEvaluation$SequentialWriteTest.testRow(PerformanceEvaluation.java:736)
        at 
org.evaluation.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:571)
        at 
org.evaluation.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:804)
        at 
org.evaluation.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:350)
        at 
org.evaluation.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:326)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:518)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:303)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176)

From region server log:-
2009-08-19 00:47:22,740 WARN org.apache.hadoop.hbase.regionserver.Store: Not in 
setorg.apache.hadoop.hbase.regionserver.storescan...@1e458ae5
2009-08-19 00:48:22,741 WARN org.apache.hadoop.hbase.regionserver.Store: Not in 
setorg.apache.hadoop.hbase.regionserver.storescan...@4b28029c
2009-08-19 00:49:22,743 WARN org.apache.hadoop.hbase.regionserver.Store: Not in 
setorg.apache.hadoop.hbase.regionserver.storescan...@33fb11e0
2009-08-19 00:50:22,745 WARN org.apache.hadoop.hbase.regionserver.Store: Not in 
setorg.apache.hadoop.hbase.regionserver.storescan...@6ceccc3b
2009-08-19 00:51:22,746 WARN org.apache.hadoop.hbase.regionserver.Store: Not in 
setorg.apache.hadoop.hbase.regionserver.storescan...@44a3b5b4

Thanks,
Murali Krishna




________________________________
From: Jonathan Gray <[email protected]>
To: [email protected]
Sent: Wednesday, 19 August, 2009 12:26:55 AM
Subject: Re: HBase-0.20.0 randomRead

With all that memory, you're likely seeing such good performance becauseof filesystem caching. As you say, 2ms is extraordinarily fast for adisk read, but since your rows are relatively small, you are loading upall that data into memory (not only the fs cache, but also hbase's blockcache which makes it even faster).


JG

Jean-Daniel Cryans wrote:

Well it seems there's something wrong with the way you modified PE. It
is not really testing your table unless the row keys are built the
same way as TestTable is, to me it seems that you are testing on only
20000 rows so caching is easy. A better test would just be to use PE
the way it currently is but with ROW_LENGTH = 4k.

WRT Jetty, make sure you optimized it with
http://jetty.mortbay.org/jetty5/doc/optimization.html

J-D

On Tue, Aug 18, 2009 at 12:08 PM, Murali Krishna.
P<[email protected]> wrote:

Ahh, mistake, I just took it as seconds.

Now I wonder whether it can really do that fast ?? wont it take atleast 2ms for 
disk read? ( I have given 8G heapspace for RegionServer, is it caching so 
much?). Has anyone seen these kind of numbers ?


Actually, my initial problem was that I have a jetty infront of this hbase to 
serve this 4k value and when bench marked, it took 200+milliseconds for each 
record with 100 clients. That is why decided to benchmark without jetty first.

Thanks,
Murali Krishna




________________________________
From: Jean-Daniel Cryans <[email protected]>
To: [email protected]
Sent: Tuesday, 18 August, 2009 9:13:40 PM
Subject: Re: HBase-0.20.0 randomRead

Murali,

I'm not reading the same thing as you.

client-0 Finished randomRead in 2867ms at offset 0 for 10000 rows

That means 2867 / 10000 = 0.2867ms per row. It's kinda fast.

J-D

On Tue, Aug 18, 2009 at 11:35 AM, Murali Krishna.
P<[email protected]> wrote:

Hi all,
 (Saw a related thread on performance, but starting a different one because my 
setup is slightly different).

I have an one node setup with hbase-0..20(alpha). It has around 11million rows 
with ~250 regions. Each row with ~20 bytes sized key and ~4k sized value.
Since my primary concern is randomRead, modified the performanceEvaluation code 
to read from this particular table. The randomRead test gave following result.

09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: client-1 Finished 
randomRead in 2813ms at offset 10000 for 10000 rows
09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: Finished 1 in 2813ms 
writing 10000 rows
09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: client-0 Finished 
randomRead in 2867ms at offset 0 for 10000 rows
09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: Finished 0 in 2867ms 
writing 10000 rows


So, looks like it is taking around 280ms per record. Looking at the latest 
hbase performance claims, I was expecting it below 10ms. Am  I doing something 
basically wrong, since such a hiuge difference :( ? Please help me fix the 
latency.

The machine config is:
Processors:    2 x Xeon L5420 2.50GHz (8 cores)
Memory:        13.7GB
12 Disks of 1TB each.

Let me know if you need anymore details

Thanks,
Murali Krishna

Re: HBase-0.20.0 randomRead

Reply via email to