NotServingRegionExceptions are normal when they appear in the regionserver logs. They're not normal when they come out of your client code. You get an NSRE when a region gets split or reassigned and the client's cache of the region's location is out of date. Normally, the HTable client retries a bunch, and eventually it gets sorted out. However, if the reassignment/splitting/etc takes longer than all the retries, the client will get the NSRE. In general we'd like for those not to happen, but I'm not sure that there's actually something wrong.

When you say once in a while, how frequent are you talking about?

If you want to tune this problem away, you can edit your hbase- site.xml and change hbase.client.retries to be a bigger number and/or hbase.client.pause to be longer. That might resolve your issue. If something is actually broken in HBase, more retries won't help, and that would be an interesting fact to know. If it is just a timing/ load issue, then more retries or a longer pause will probably fix it. This would also be a really interesting fact to know :).

Glad to hear that trunk erases some of the mystery of 0.16!

-Bryan

On Apr 18, 2008, at 3:29 AM, Rong-en Fan wrote:

I'm running hbase and hadoop-0.17 trunk code as of earlier today (without HBASE-10). While loading 50m records into a table with ~800,000 rows with only
one column family. This is a 3 node DFS and 3 region servers. I load
the data from one of these three boxes. Once awhilte, I got NotServingRegion
exception, the code looks like

BatchUpdate bu = new BatchUpdate(row)
bu.put(...)
table.commit(bu)

When I examine region server's log, it shows something like:

08/04/18 01:51:14 open the region in question
08/04/18 01:51:15 region available
08/04/18 01:51:15 starting compaction
08/04/18 01:51:22 region closed
08/04/18 01:51:41 NotServingRegion Exception
08/04/18 01:51:47 compaction done
08/04/18 01:51:51 NotServingRegion Exception
08/04/18 01:52:01 NotServingRegion Exception
08/04/18 01:52:11 NotServingRegion Exception
08/04/18 01:52:21 NotServingRegion Exception
08/04/18 01:52:47 open the region in question
08/04/18 01:52:47 region avilable

the master log somehow got truncated, IIRC, the master tried to assign the
region to this region server some where between 01:51:22 and 01:51:41.

From my understanding, this region server is a little busy so it does not accept the assignment from the master. I'm wondering if this is caused by too busy regionsserver (the request per sec on each region server is about
1000), and if so, what configuration variables should I tune with?
In addition, what would be the best practices when writing client by
java to deal with such exception (as NotServingRegion should be common
on a very busy HBase instance, I think).

BTW, I was getting lots of different strange failures when doing the same thing on hadoop-0.16.X and hbase-0.1.X. After switching to hbase trunk, I only get the error above. It seems there are no more mysterious exceptions :-D

Thanks,
Rong-En Fan

Reply via email to