Ryan, Thank you for your advice.
I tried both approach, restart master does not work, delete info:server from .META. works. One minor thing: I used deleteall command first, turned out this deleted the region (including info:regioninfo) completely. The region is lost. Luckily, I have another region with the same problem, I tried with delete command (not deleteall), it worked as you described, the region is reassigned and opened successfully with a region server. I can get rows within the region. Now I just need to find out all the bad regions and fix it this way...... really hoping for a hbase fsck command. Back to the original cause (region got closed due to duplicate assignment to the same region server), is it a bug? Shall I open a ticket for it? Thanks. Haijun ________________________________ From: Ryan Rawson <[email protected]> To: [email protected] Sent: Sunday, July 19, 2009 1:29:37 PM Subject: Re: NSRE due to duplicate assignment (MSG_REGION_CLOSE_WITHOUT_REPORT) A quick recover is to kill your master with 'kill' (not hbase-daemon.sh). Then restart it. If that doesn't work, you might have to manually delete the regionserver assignment in meta: deleteall '.META.', 'TestTable,0089182778,1247979707102', 'info:server' The master will reassign the region within 60 seconds. Let us know! -ryan On Sun, Jul 19, 2009 at 1:24 PM, Haijun Cao<[email protected]> wrote: > > > > > Hi > > > I am experiencing the NSRE exception (however, not all NSRE is created equal, > so it seems) while scanning TestTable, TestTable is previously populated with > sequentialWrite 100x1M records (using PerformanceEvaluation map reduce). > > I checked the region in exception and found that the region is not served > because region sever is complaining about duplicate assignment: > MSG_REGION_CLOSE_WITHOUT_REPORT: TestTable,0089182778,1247979707102: > Duplicate assignment > > I checked the .META. for the region, it indeed has two > assignment records. > > I am wondering if this is a bug? How I can recover the region from this? (I > searched archieve using duplicate assignment, got no result). > > I am on hbase truck, hadoop-0.20.0 (plus 4681), zookeeper-3.2, test env has > 3 machine (8core, 16G, 4x750G SATA disk, raid 0). DataNode xreciver=4096, > handler=50, ulimit 32768 (followed hbase-0.20.0-alpha overview_description > religiously) > > > Thanks in advance. > > Haijun > > > > 1. Exception while scanning: > > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact > region server 10.10.30.106:60020 for region > TestTable,0089182778,1247979707102, row '0089182778', but failed after 10 > attempts. > Exceptions: > org.apache.hadoop.hbase.NotServingRegionException: > org.apache.hadoop.hbase.NotServingRegionException: > TestTable,0089182778,1247979707102 > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2230) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1848) > at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:643) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:913) > > 2. duplicate assignments for the region in .META. > > Timestamp > Event > Description > Sat, 18 Jul 2009 22:05:00 open Region opened on server: snv-it-lin-012 > > Sat, 18 Jul 2009 22:04:57 assignment Region assigned to server > snv-it-lin-012,60020,1247965643087 > Sat, 18 Jul 2009 22:04:54 assignment Region assigned to server > snv-it-lin-012,60020,1247965643087 > Sat, 18 Jul 2009 22:04:49 split Region split > from:TestTable,0089182778,1247904130413 > > 3. Region server log file: > > [hai...@snv-it-lin-012 ~]$ grep TestTable,0089182778,1247979707102 > /disk1/opt/kindsight/hbase/hbase/logs/hbase-haijun-regionserver-snv-it-lin-012.log.2009-07-18 > 2009-07-18 22:04:54,014 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: > TestTable,0089182778,1247979707102 > 2009-07-18 22:04:54,015 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN: > TestTable,0089182778,1247979707102 > 2009-07-18 22:04:57,085 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: > TestTable,0089182778,1247979707102 > 2009-07-18 22:05:00,077 INFO > org.apache.hadoop.hbase.regionserver.HRegion: region > TestTable,0089182778,1247979707102/1884010304 available; sequence id is > 57144455 > 2009-07-18 22:05:00,100 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN: > TestTable,0089182778,1247979707102 > 2009-07-18 22:05:03,242 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > MSG_REGION_CLOSE_WITHOUT_REPORT: TestTable,0089182778,1247979707102: > Duplicate assignment > 2009-07-18 22:05:03,242 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: > MSG_REGION_CLOSE_WITHOUT_REPORT: TestTable,0089182778,1247979707102: > Duplicate assignment > 2009-07-18 22:05:03,243 INFO org.apache.hadoop.hbase.regionserver.HRegion: > Closed TestTable,0089182778,1247979707102 > > >
