On Sun, Jul 19, 2009 at 2:18 PM, Haijun Cao <[email protected]> wrote:
>
> Now I just need to find out all the bad regions and fix it this way......
> really hoping for a hbase fsck command.
If too many, until hbasck command, might have to restart to fix.
To find bad regions, try scanning for a column in a column family that you
know doesn't exist in shell ("scan 'TABLENAME', {COLUMNS =>
'NON_EXISTENT_COLUMN'}"). Make sure DEBUG is enabled on client before you
begin. With DEBUG, you'll see the region its trying to load before it does
so you can identify the troublesome ones.
> Back to the original cause (region got closed due to duplicate assignment
> to the same region server), is it a bug? Shall I open a ticket for it?
>
May already be one. If you send on the master log, can send it me private,
I can figure if new condition (If not DEBUG, probably of little use. Please
also name the region doubly-assigned).
Thanks Haijun.
St.Ack
>
> Thanks.
>
> Haijun
>
>
> ________________________________
> From: Ryan Rawson <[email protected]>
> To: [email protected]
> Sent: Sunday, July 19, 2009 1:29:37 PM
> Subject: Re: NSRE due to duplicate assignment
> (MSG_REGION_CLOSE_WITHOUT_REPORT)
>
> A quick recover is to kill your master with 'kill' (not
> hbase-daemon.sh). Then restart it.
>
> If that doesn't work, you might have to manually delete the
> regionserver assignment in meta:
> deleteall '.META.', 'TestTable,0089182778,1247979707102', 'info:server'
>
> The master will reassign the region within 60 seconds.
>
> Let us know!
> -ryan
>
> On Sun, Jul 19, 2009 at 1:24 PM, Haijun Cao<[email protected]> wrote:
> >
> >
> >
> >
> > Hi
> >
> >
> > I am experiencing the NSRE exception (however, not all NSRE is created
> equal, so it seems) while scanning TestTable, TestTable is previously
> populated with sequentialWrite 100x1M records (using PerformanceEvaluation
> map reduce).
> >
> > I checked the region in exception and found that the region is not served
> because region sever is complaining about duplicate assignment:
> > MSG_REGION_CLOSE_WITHOUT_REPORT: TestTable,0089182778,1247979707102:
> Duplicate assignment
> >
> > I checked the .META. for the region, it indeed has two
> > assignment records.
> >
> > I am wondering if this is a bug? How I can recover the region from this?
> (I searched archieve using duplicate assignment, got no result).
> >
> > I am on hbase truck, hadoop-0.20.0 (plus 4681), zookeeper-3.2, test env
> has
> > 3 machine (8core, 16G, 4x750G SATA disk, raid 0). DataNode
> xreciver=4096, handler=50, ulimit 32768 (followed hbase-0.20.0-alpha
> overview_description religiously)
> >
> >
> > Thanks in advance.
> >
> > Haijun
> >
> >
> >
> > 1. Exception while scanning:
> >
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
> contact region server 10.10.30.106:60020 for region
> TestTable,0089182778,1247979707102, row '0089182778', but failed after 10
> attempts.
> > Exceptions:
> > org.apache.hadoop.hbase.NotServingRegionException:
> org.apache.hadoop.hbase.NotServingRegionException:
> TestTable,0089182778,1247979707102
> > at
> >
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2230)
> > at
> >
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1848)
> > at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
> > at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:597)
> > at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:643)
> > at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:913)
> >
> > 2. duplicate assignments for the region in .META.
> >
> > Timestamp
> > Event
> > Description
> > Sat, 18 Jul 2009 22:05:00 open Region opened on server: snv-it-lin-012
> >
> > Sat, 18 Jul 2009 22:04:57 assignment Region assigned to server
> snv-it-lin-012,60020,1247965643087
> > Sat, 18 Jul 2009 22:04:54 assignment Region assigned to server
> snv-it-lin-012,60020,1247965643087
> > Sat, 18 Jul 2009 22:04:49 split Region split
> from:TestTable,0089182778,1247904130413
> >
> > 3. Region server log file:
> >
> > [hai...@snv-it-lin-012 ~]$ grep TestTable,0089182778,1247979707102
>
> /disk1/opt/kindsight/hbase/hbase/logs/hbase-haijun-regionserver-snv-it-lin-012.log.2009-07-18
> > 2009-07-18 22:04:54,014 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
> TestTable,0089182778,1247979707102
> > 2009-07-18 22:04:54,015 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:
> TestTable,0089182778,1247979707102
> > 2009-07-18 22:04:57,085 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
> TestTable,0089182778,1247979707102
> > 2009-07-18 22:05:00,077 INFO
> > org.apache.hadoop.hbase.regionserver.HRegion: region
> > TestTable,0089182778,1247979707102/1884010304 available; sequence id is
> 57144455
> > 2009-07-18 22:05:00,100 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:
> TestTable,0089182778,1247979707102
> > 2009-07-18 22:05:03,242 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> MSG_REGION_CLOSE_WITHOUT_REPORT: TestTable,0089182778,1247979707102:
> Duplicate assignment
> > 2009-07-18 22:05:03,242 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
> MSG_REGION_CLOSE_WITHOUT_REPORT: TestTable,0089182778,1247979707102:
> Duplicate assignment
> > 2009-07-18 22:05:03,243 INFO
> org.apache.hadoop.hbase.regionserver.HRegion: Closed
> TestTable,0089182778,1247979707102
> >
> >
> >
>
>
>
>
>