Thanks for the log though not at DEBUG. This is a clean checkout?
What I see is a split, and then we assign out the lower half of the split twice but we don't assign the top half. We had a bug like this after hbase-1304 went in but was fixed long time back. The MSG_REGION_CLOSE_WITHOUT_REPORT is rare but we don't seem to be doing the right thing when we get one. St.Ack On Sun, Jul 19, 2009 at 3:16 PM, stack <[email protected]> wrote: > Then you would have missed this fix where edits to .META. were frozen out > making it double-assignment more likely: > > ------------------------------------------------------------------------ > r794867 | stack | 2009-07-16 14:29:05 -0700 (Thu, 16 Jul 2009) | 1 line > > HBASE-1664 Disable 1058 on catalog tables > > Thanks for your patience and for living on the edge/TRUNK. > > St.Ack > P.S. Would be interested in your master log nonetheless > > > > > On Sun, Jul 19, 2009 at 3:07 PM, Haijun Cao <[email protected]> wrote: > >> >> Yes, as recent as: Jul 16 13:48 >> >> >> Haijun >> >> >> ________________________________ >> From: stack <[email protected]> >> To: [email protected] >> Sent: Sunday, July 19, 2009 2:35:48 PM >> Subject: Re: NSRE due to duplicate assignment >> (MSG_REGION_CLOSE_WITHOUT_REPORT) >> >> Are you on a recent TRUNK? A few fixes went in end of last week that help >> with this. >> >> >> On Sun, Jul 19, 2009 at 1:24 PM, Haijun Cao <[email protected]> wrote: >> >> > >> > I checked the .META. for the region, it indeed has two >> > assignment records. >> > >> > I am wondering if this is a bug? How I can recover the region from this? >> (I >> > searched archieve using duplicate assignment, got no result). >> > >> >> May I see the master log from around the double assignment (if you were >> running DEBUG). >> >> Yeah, its a bug. >> >> Do as Ryan suggested or in shell do "close_region REGIONNAME". It'll be >> reassigned and then reopened elsewhere. >> >> St.Ack >> >> >> >> > >> > I am on hbase truck, hadoop-0.20.0 (plus 4681), zookeeper-3.2, test env >> has >> > 3 machine (8core, 16G, 4x750G SATA disk, raid 0). DataNode >> xreciver=4096, >> > handler=50, ulimit 32768 (followed hbase-0.20.0-alpha >> overview_description >> > religiously) >> > >> > >> > Thanks in advance. >> > >> > Haijun >> > >> > >> > >> > 1. Exception while scanning: >> > >> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to >> contact >> > region server 10.10.30.106:60020 for region >> > TestTable,0089182778,1247979707102, row '0089182778', but failed after >> 10 >> > attempts. >> > Exceptions: >> > org.apache.hadoop.hbase.NotServingRegionException: >> > org.apache.hadoop.hbase.NotServingRegionException: >> > TestTable,0089182778,1247979707102 >> > at >> > >> > >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2230) >> > at >> > >> > >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1848) >> > at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) >> > at >> > >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> > at java.lang.reflect.Method.invoke(Method.java:597) >> > at >> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:643) >> > at >> > >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:913) >> > >> > 2. duplicate assignments for the region in .META. >> > >> > Timestamp >> > Event >> > Description >> > Sat, 18 Jul 2009 22:05:00 open Region opened on server: snv-it-lin-012 >> > >> > Sat, 18 Jul 2009 22:04:57 assignment Region assigned to server >> > snv-it-lin-012,60020,1247965643087 >> > Sat, 18 Jul 2009 22:04:54 assignment Region assigned to server >> > snv-it-lin-012,60020,1247965643087 >> > Sat, 18 Jul 2009 22:04:49 split Region split >> > from:TestTable,0089182778,1247904130413 >> > >> > 3. Region server log file: >> > >> > [hai...@snv-it-lin-012 ~]$ grep TestTable,0089182778,1247979707102 >> > >> >> /disk1/opt/kindsight/hbase/hbase/logs/hbase-haijun-regionserver-snv-it-lin-012.log.2009-07-18 >> > 2009-07-18 22:04:54,014 INFO >> > org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: >> > TestTable,0089182778,1247979707102 >> > 2009-07-18 22:04:54,015 INFO >> > org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: >> MSG_REGION_OPEN: >> > TestTable,0089182778,1247979707102 >> > 2009-07-18 22:04:57,085 INFO >> > org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: >> > TestTable,0089182778,1247979707102 >> > 2009-07-18 22:05:00,077 INFO >> > org.apache.hadoop.hbase.regionserver.HRegion: region >> > TestTable,0089182778,1247979707102/1884010304 available; sequence id is >> > 57144455 >> > 2009-07-18 22:05:00,100 INFO >> > org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: >> MSG_REGION_OPEN: >> > TestTable,0089182778,1247979707102 >> > 2009-07-18 22:05:03,242 INFO >> > org.apache.hadoop.hbase.regionserver.HRegionServer: >> > MSG_REGION_CLOSE_WITHOUT_REPORT: TestTable,0089182778,1247979707102: >> > Duplicate assignment >> > 2009-07-18 22:05:03,242 INFO >> > org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: >> > MSG_REGION_CLOSE_WITHOUT_REPORT: TestTable,0089182778,1247979707102: >> > Duplicate assignment >> > 2009-07-18 22:05:03,243 INFO >> org.apache.hadoop.hbase.regionserver.HRegion: >> > Closed TestTable,0089182778,1247979707102 >> > >> > >> > >> >> >> >> >> > >
