Re: too busy host causes NotServingRegion exception?

Bryan Duxbury Fri, 18 Apr 2008 09:14:37 -0700

NotServingRegionExceptions are normal when they appear in theregionserver logs. They're not normal when they come out of yourclient code. You get an NSRE when a region gets split or reassignedand the client's cache of the region's location is out of date.Normally, the HTable client retries a bunch, and eventually it getssorted out. However, if the reassignment/splitting/etc takes longerthan all the retries, the client will get the NSRE. In general we'dlike for those not to happen, but I'm not sure that there's actuallysomething wrong.


When you say once in a while, how frequent are you talking about?

If you want to tune this problem away, you can edit your hbase-site.xml and change hbase.client.retries to be a bigger number and/orhbase.client.pause to be longer. That might resolve your issue. Ifsomething is actually broken in HBase, more retries won't help, andthat would be an interesting fact to know. If it is just a timing/load issue, then more retries or a longer pause will probably fix it.This would also be a really interesting fact to know :).


Glad to hear that trunk erases some of the mystery of 0.16!

-Bryan

On Apr 18, 2008, at 3:29 AM, Rong-en Fan wrote:

I'm running hbase and hadoop-0.17 trunk code as of earlier today(withoutHBASE-10). While loading 50m records into a table with ~800,000rows with only
one column family. This is a 3 node DFS and 3 region servers. I load
the data from one of these three boxes. Once awhilte, I gotNotServingRegion
exception, the code looks like

BatchUpdate bu = new BatchUpdate(row)
bu.put(...)
table.commit(bu)

When I examine region server's log, it shows something like:

08/04/18 01:51:14 open the region in question
08/04/18 01:51:15 region available
08/04/18 01:51:15 starting compaction
08/04/18 01:51:22 region closed
08/04/18 01:51:41 NotServingRegion Exception
08/04/18 01:51:47 compaction done
08/04/18 01:51:51 NotServingRegion Exception
08/04/18 01:52:01 NotServingRegion Exception
08/04/18 01:52:11 NotServingRegion Exception
08/04/18 01:52:21 NotServingRegion Exception
08/04/18 01:52:47 open the region in question
08/04/18 01:52:47 region avilable
the master log somehow got truncated, IIRC, the master tried toassign the
region to this region server some where between 01:51:22 and 01:51:41.
From my understanding, this region server is a little busy so itdoes notaccept the assignment from the master. I'm wondering if this iscaused bytoo busy regionsserver (the request per sec on each region serveris about
1000), and if so, what configuration variables should I tune with?
In addition, what would be the best practices when writing client by
java to deal with such exception (as NotServingRegion should be common
on a very busy HBase instance, I think).
BTW, I was getting lots of different strange failures when doingthe samething on hadoop-0.16.X and hbase-0.1.X. After switching to hbasetrunk,I only get the error above. It seems there are no more mysteriousexceptions :-D
Thanks,
Rong-En Fan

Re: too busy host causes NotServingRegion exception?

Reply via email to