On Sat, Apr 19, 2008 at 12:14 AM, Bryan Duxbury <[EMAIL PROTECTED]> wrote: > NotServingRegionExceptions are normal when they appear in the regionserver > logs. They're not normal when they come out of your client code. You get an > NSRE when a region gets split or reassigned and the client's cache of the > region's location is out of date. Normally, the HTable client retries a > bunch, and eventually it gets sorted out. However, if the > reassignment/splitting/etc takes longer than all the retries, the client > will get the NSRE. In general we'd like for those not to happen, but I'm not > sure that there's actually something wrong. > > When you say once in a while, how frequent are you talking about?
Well, first occurs after one hour of writing and second one occurs few minutes later. However, after I sent the mail, it has no problems at all for the next couple hours of writing. Regards, Rong-En Fan > If you want to tune this problem away, you can edit your hbase-site.xml and > change hbase.client.retries to be a bigger number and/or hbase.client.pause > to be longer. That might resolve your issue. If something is actually broken > in HBase, more retries won't help, and that would be an interesting fact to > know. If it is just a timing/load issue, then more retries or a longer pause > will probably fix it. This would also be a really interesting fact to know > :). > > Glad to hear that trunk erases some of the mystery of 0.16! > > -Bryan > > > > On Apr 18, 2008, at 3:29 AM, Rong-en Fan wrote: > > > > I'm running hbase and hadoop-0.17 trunk code as of earlier today (without > > HBASE-10). While loading 50m records into a table with ~800,000 rows with > only > > one column family. This is a 3 node DFS and 3 region servers. I load > > the data from one of these three boxes. Once awhilte, I got > NotServingRegion > > exception, the code looks like > > > > BatchUpdate bu = new BatchUpdate(row) > > bu.put(...) > > table.commit(bu) > > > > When I examine region server's log, it shows something like: > > > > 08/04/18 01:51:14 open the region in question > > 08/04/18 01:51:15 region available > > 08/04/18 01:51:15 starting compaction > > 08/04/18 01:51:22 region closed > > 08/04/18 01:51:41 NotServingRegion Exception > > 08/04/18 01:51:47 compaction done > > 08/04/18 01:51:51 NotServingRegion Exception > > 08/04/18 01:52:01 NotServingRegion Exception > > 08/04/18 01:52:11 NotServingRegion Exception > > 08/04/18 01:52:21 NotServingRegion Exception > > 08/04/18 01:52:47 open the region in question > > 08/04/18 01:52:47 region avilable > > > > the master log somehow got truncated, IIRC, the master tried to assign the > > region to this region server some where between 01:51:22 and 01:51:41. > > > > From my understanding, this region server is a little busy so it does not > > accept the assignment from the master. I'm wondering if this is caused by > > too busy regionsserver (the request per sec on each region server is about > > 1000), and if so, what configuration variables should I tune with? > > In addition, what would be the best practices when writing client by > > java to deal with such exception (as NotServingRegion should be common > > on a very busy HBase instance, I think). > > > > BTW, I was getting lots of different strange failures when doing the same > > thing on hadoop-0.16.X and hbase-0.1.X. After switching to hbase trunk, > > I only get the error above. It seems there are no more mysterious > exceptions :-D > > > > Thanks, > > Rong-En Fan > > > >
