Re: Loading large resultset into HBase

Andrew Purtell Fri, 15 May 2009 13:54:49 -0700

I've seen this too. Seems to correlate with uneven block placement in HDFS, so 
maybe it is cache of a hot region.

________________________________
From: stack <[email protected]>
To: [email protected]
Sent: Friday, May 15, 2009 1:38:22 PM
Subject: Re: Loading large resultset into HBase

On Fri, May 15, 2009 at 1:11 PM, llpind <[email protected]> wrote:

>
> Hey all,
>
> I'm loading data from a DB into HBase.  I have a single java process
> iterating over a ResultSet.  After about 10,000 rows i do a BatchUpdate.
> I've changed the Heap size of both Hadoop & HBase to 2000.
>
> Setup:  0.19.1.  1 box with master and secondary.  3 boxes with
> HRegionServer.
>
> Problem 1:  The load seems to be unblananced:
>  Address Start Code Load
> 1:60020 1242415770566 requests=0, regions=1, usedHeap=39, maxHeap=1777
> 2:60020 1242415770417 requests=3, regions=3, usedHeap=52, maxHeap=1777
> 3:60020 1242415770273 requests=1, regions=3, usedHeap=604, maxHeap=1777
> Total:  servers: 3   requests=4, regions=7
>

Yeah, it looks unbalanced memory-wise but not too off regions wise.  What
happens if you load more into your cluster?  And used heap fluctuates,
doesn't it?

>
> Problem 2:  Around 10 Million rows, the upload starts to slow down.
>

Turn on DEBUG logging -- see FAQ on wiki for how -- and then tail one of
your regionserver logs.  Whats going on?

It might be the fact that the commit log buffer is small in hbase 0.19.1.  I
found this issue recently in
https://issues.apache.org/jira/browse/HBASE-1394.   Change
hbase.regionserver.hlog.blocksize from its 1MB size up to say, 64MB?

>
> The upload is still going, so i'll update on what happens.

Please do.

St.Ack

>
> --
> View this message in context:
> http://www.nabble.com/Loading-large-resultset-into-HBase-tp23566568p23566568.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>

Re: Loading large resultset into HBase

Reply via email to