I've seen this too. Seems to correlate with uneven block placement in HDFS, so maybe it is cache of a hot region.
________________________________ From: stack <[email protected]> To: [email protected] Sent: Friday, May 15, 2009 1:38:22 PM Subject: Re: Loading large resultset into HBase On Fri, May 15, 2009 at 1:11 PM, llpind <[email protected]> wrote: > > Hey all, > > I'm loading data from a DB into HBase. I have a single java process > iterating over a ResultSet. After about 10,000 rows i do a BatchUpdate. > I've changed the Heap size of both Hadoop & HBase to 2000. > > Setup: 0.19.1. 1 box with master and secondary. 3 boxes with > HRegionServer. > > Problem 1: The load seems to be unblananced: > Address Start Code Load > 1:60020 1242415770566 requests=0, regions=1, usedHeap=39, maxHeap=1777 > 2:60020 1242415770417 requests=3, regions=3, usedHeap=52, maxHeap=1777 > 3:60020 1242415770273 requests=1, regions=3, usedHeap=604, maxHeap=1777 > Total: servers: 3 requests=4, regions=7 > Yeah, it looks unbalanced memory-wise but not too off regions wise. What happens if you load more into your cluster? And used heap fluctuates, doesn't it? > > Problem 2: Around 10 Million rows, the upload starts to slow down. > Turn on DEBUG logging -- see FAQ on wiki for how -- and then tail one of your regionserver logs. Whats going on? It might be the fact that the commit log buffer is small in hbase 0.19.1. I found this issue recently in https://issues.apache.org/jira/browse/HBASE-1394. Change hbase.regionserver.hlog.blocksize from its 1MB size up to say, 64MB? > > The upload is still going, so i'll update on what happens. Please do. St.Ack > > -- > View this message in context: > http://www.nabble.com/Loading-large-resultset-into-HBase-tp23566568p23566568.html > Sent from the HBase User mailing list archive at Nabble.com. > >
