[ 
https://issues.apache.org/jira/browse/HBASE-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146111#comment-13146111
 ] 

mingjian commented on HBASE-3914:
---------------------------------

@stack: The following is our master log:
{noformat} 
2011-10-19 19:13:34,873 ERROR org.apache.hadoop.hbase.executor.EventHandler: 
Caught throwable while processing event M_META_SERVER_S
HUTDOWN
org.apache.hadoop.ipc.RemoteException: 
org.apache.hadoop.hbase.ipc.ServerNotRunningException: Server is not running yet
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1090)

        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:771)
        at 
org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:256)
        at $Proxy7.getRegionInfo(Unknown Source)
        at 
org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRegionLocation(CatalogTracker.java:424)
        at 
org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:471)
        at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:90)
        at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:126)
        at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662){noformat} 

After this, -ROOT-'s region won't be assigned, like this:
{noformat} 
2011-10-19 19:18:40,000 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: 
locateRegionInMeta parent
Table=-ROOT-, metaLocation=address: dw79.kgb.sqa.cm4:60020, regioninfo: 
-ROOT-,,0.70236052, attempt=0 of 10 failed; retrying after s
leep of 1000 because: org.apache.hadoop.hbase.NotServingRegionException: Region 
is not online: -ROOT-,,0
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2771)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1802)
        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:569)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1091)
{noformat}
So we should rewrite the verify method both in branch-0.90 and trunk
                
> ROOT region appeared in two regionserver's onlineRegions at the same time
> -------------------------------------------------------------------------
>
>                 Key: HBASE-3914
>                 URL: https://issues.apache.org/jira/browse/HBASE-3914
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.3
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3914-V2.patch, HBASE-3914.patch
>
>
> This could be happen under the following steps with little probability:
> (I suppose the cluster nodes names are RS1/RS2/HM, and there's more than 
> 10,000 regions in the cluster)
> 1.Root region was opened in RS1.
> 2.Due to some reason(Maybe the hdfs process was got abnormal),RS1 aborted.
> 3.ServerShutdownHandler process start.
> 4.HMaster was restarted, during the finishInitialization's handling, ROOT 
> region was unsetted, and assigned to RS2. 
> 5.Root region was opened successfully in RS2.
> 6.But after while, ROOT region was unsetted again by RS1's 
> ServerShutdownHandler. Then it was reassigned. Before that, the RS1 was 
> restarted. So there's two possibilities:
>  Case a:
>    ROOT region was assigned to RS1. 
>    It seemed nothing would be affected. But the root region was still online 
> in RS2.  
>    
>  Case b:
>    ROOT region was assigned to RS2.    
>    The ROOT Region couldn't be opened until it would be reassigned to other 
> regionserver, because it was showed online in this regionserver.
> This could be proved from the logs:
> 1. ROOT region was opened with two times:
> 2011-05-17 10:32:59,188 DEBUG 
> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region 
> -ROOT-,,0.70236052 on 162-2-77-0,20020,1305598359031
> 2011-05-17 10:33:01,536 DEBUG 
> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region 
> -ROOT-,,0.70236052 on 162-2-16-6,20020,1305597548212
> 2.Regionserver 162-2-16-6 was aborted, so it was reassigned to 162-2-77-0, 
> but already online on this server:
> 10:49:30,920 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: 
> Received request to open region: -ROOT-,,0.70236052 10:49:30,920 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing 
> open of -ROOT-,,0.70236052 10:49:30,920 WARN 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Attempted 
> open of -ROOT-,,0.70236052 but already online on this server
> This could be cause a long break of ROOT region offline, though it happened 
> under a special scenario. And I have checked the code, it seems a tiny bug 
> here.
> There's 2 references about assignRoot():
> 1.
> HMaster# assignRootAndMeta:
>     if (!catalogTracker.verifyRootRegionLocation(timeout)) {
>       this.assignmentManager.assignRoot();
>       this.catalogTracker.waitForRoot();
>       assigned++;
>     }
> 2.
> ServerShutdownHandler# process: 
>     
>       if (isCarryingRoot()) { // -ROOT-      
>         try {        
>            this.services.getAssignmentManager().assignRoot();
>         } catch (KeeperException e) {
>            this.server.abort("In server shutdown processing, assigning root", 
> e);
>            throw new IOException("Aborting", e);
>         }
>       }    
> I think each time call the method of assignRoot(), we should verify Root 
> Region's Location first. Because before the assigning, the ROOT region could 
> have been assigned by another place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to