Murali,
Which version of HBase are you running?
There was a fix that was just committed a few days ago for a bug that
manifested as null/empty HRI.
It has been fixed in RC2, so I recommend upgrading to that and trying
your upload again.
JG
Murali Krishna. P wrote:
Thanks for the clarification. I changed the ROW_LENGTH as you suggested and
used SequenceWrite + randomRead combination to benchmark. Initial result was
impressive, eventhough I would like to have the last column improved.
randomRead
=========
-nclients(--rows) 5 (10000) 50(10000)
100(10000) 1000 (10000)
totalrows
800k 0.4ms 3.5ms 6.5ms 55ms
2.3m 0.45ms 3.5ms
6.6ms 56ms
Only change in the config was that the handler count increased to 1000. I
think there will be some parameters which can be tweaked to improve this
further?
My goal is get test it for 10million rows with this box. For some reason the
sequenceWrite job with 5000000row + 2 clients failed,with the following
exception:-
09/08/19 00:34:07 INFO mapred.LocalJobRunner: 2000000/2050000/2500000
09/08/19 00:50:38 WARN mapred.LocalJobRunner: job_local_0001
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
region server Some server for region , row '0002076131', but failed after 11
attempts.
Exceptions:
java.io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.
java..io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.
java.io.IOException: HRegionInfo was null or empty in .META.
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocationForRowWithRetries(HConnectionManager.java:995)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1064)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:584)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:450)
at
org.evaluation.hbase.PerformanceEvaluation$SequentialWriteTest.testRow(PerformanceEvaluation.java:736)
at
org.evaluation.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:571)
at
org.evaluation.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:804)
at
org.evaluation.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:350)
at
org.evaluation.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:326)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:518)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:303)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176)
From region server log:-
2009-08-19 00:47:22,740 WARN org.apache.hadoop.hbase.regionserver.Store: Not in
setorg.apache.hadoop.hbase.regionserver.storescan...@1e458ae5
2009-08-19 00:48:22,741 WARN org.apache.hadoop.hbase.regionserver.Store: Not in
setorg.apache.hadoop.hbase.regionserver.storescan...@4b28029c
2009-08-19 00:49:22,743 WARN org.apache.hadoop.hbase.regionserver.Store: Not in
setorg.apache.hadoop.hbase.regionserver.storescan...@33fb11e0
2009-08-19 00:50:22,745 WARN org.apache.hadoop.hbase.regionserver.Store: Not in
setorg.apache.hadoop.hbase.regionserver.storescan...@6ceccc3b
2009-08-19 00:51:22,746 WARN org.apache.hadoop.hbase.regionserver.Store: Not in
setorg.apache.hadoop.hbase.regionserver.storescan...@44a3b5b4
Thanks,
Murali Krishna
________________________________
From: Jonathan Gray <[email protected]>
To: [email protected]
Sent: Wednesday, 19 August, 2009 12:26:55 AM
Subject: Re: HBase-0.20.0 randomRead
With all that memory, you're likely seeing such good performance because
of filesystem caching. As you say, 2ms is extraordinarily fast for a
disk read, but since your rows are relatively small, you are loading up
all that data into memory (not only the fs cache, but also hbase's block
cache which makes it even faster).
JG
Jean-Daniel Cryans wrote:
Well it seems there's something wrong with the way you modified PE. It
is not really testing your table unless the row keys are built the
same way as TestTable is, to me it seems that you are testing on only
20000 rows so caching is easy. A better test would just be to use PE
the way it currently is but with ROW_LENGTH = 4k.
WRT Jetty, make sure you optimized it with
http://jetty.mortbay.org/jetty5/doc/optimization.html
J-D
On Tue, Aug 18, 2009 at 12:08 PM, Murali Krishna.
P<[email protected]> wrote:
Ahh, mistake, I just took it as seconds.
Now I wonder whether it can really do that fast ?? wont it take atleast 2ms for
disk read? ( I have given 8G heapspace for RegionServer, is it caching so
much?). Has anyone seen these kind of numbers ?
Actually, my initial problem was that I have a jetty infront of this hbase to
serve this 4k value and when bench marked, it took 200+milliseconds for each
record with 100 clients. That is why decided to benchmark without jetty first.
Thanks,
Murali Krishna
________________________________
From: Jean-Daniel Cryans <[email protected]>
To: [email protected]
Sent: Tuesday, 18 August, 2009 9:13:40 PM
Subject: Re: HBase-0.20.0 randomRead
Murali,
I'm not reading the same thing as you.
client-0 Finished randomRead in 2867ms at offset 0 for 10000 rows
That means 2867 / 10000 = 0.2867ms per row. It's kinda fast.
J-D
On Tue, Aug 18, 2009 at 11:35 AM, Murali Krishna.
P<[email protected]> wrote:
Hi all,
(Saw a related thread on performance, but starting a different one because my
setup is slightly different).
I have an one node setup with hbase-0..20(alpha). It has around 11million rows
with ~250 regions. Each row with ~20 bytes sized key and ~4k sized value.
Since my primary concern is randomRead, modified the performanceEvaluation code
to read from this particular table. The randomRead test gave following result.
09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: client-1 Finished
randomRead in 2813ms at offset 10000 for 10000 rows
09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: Finished 1 in 2813ms
writing 10000 rows
09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: client-0 Finished
randomRead in 2867ms at offset 0 for 10000 rows
09/08/18 08:20:41 INFO hbase.PerformanceEvaluation: Finished 0 in 2867ms
writing 10000 rows
So, looks like it is taking around 280ms per record. Looking at the latest
hbase performance claims, I was expecting it below 10ms. Am I doing something
basically wrong, since such a hiuge difference :( ? Please help me fix the
latency.
The machine config is:
Processors: 2 x Xeon L5420 2.50GHz (8 cores)
Memory: 13.7GB
12 Disks of 1TB each.
Let me know if you need anymore details
Thanks,
Murali Krishna