[ 
https://issues.apache.org/jira/browse/HBASE-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403105#comment-13403105
 ] 

Maryann Xue commented on HBASE-6289:
------------------------------------

@ramkrishna: Yes, i thought of this too. but i this comment before 
verifyAndAssignRoot(): "Before assign the ROOT region, ensure it haven't been 
assigned by other place". Not sure if this "ROOT assigned elsewhere" situation 
will actually possibly occur, but we seem to have seen META assigned on several 
Region Servers at the same time when there was chaos going on in our lab's 
network. There can be only one single search path for any region (incl. meta 
and root), though, regardless of client cache. And this is the thing i don't 
understand, why we try to treat ROOT differently?

                
> ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is 
> still working but only the RS's ZK node expires.
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-6289
>                 URL: https://issues.apache.org/jira/browse/HBASE-6289
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.6, 0.94.0
>            Reporter: Maryann Xue
>            Assignee: Maryann Xue
>            Priority: Critical
>         Attachments: HBASE-6289.patch
>
>
> The ROOT RS has some network problem and its ZK node expires first, which 
> kicks off the ServerShutdownHandler. it calls verifyAndAssignRoot() to try to 
> re-assign ROOT. At that time, the RS is actually still working and passes the 
> verifyRootRegionLocation() check, so the ROOT region is skipped from 
> re-assignment.
>   private void verifyAndAssignRoot()
>   throws InterruptedException, IOException, KeeperException {
>     long timeout = this.server.getConfiguration().
>       getLong("hbase.catalog.verification.timeout", 1000);
>     if (!this.server.getCatalogTracker().verifyRootRegionLocation(timeout)) {
>       this.services.getAssignmentManager().assignRoot();
>     }
>   }
> After a few moments, this RS encounters DFS write problem and decides to 
> abort. The RS then soon gets restarted from commandline, and constantly 
> report:
> 2012-06-27 23:13:08,627 DEBUG 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> NotServingRegionException; Region is not online: -ROOT-,,0
> 2012-06-27 23:13:08,627 DEBUG 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> NotServingRegionException; Region is not online: -ROOT-,,0
> 2012-06-27 23:13:08,628 DEBUG 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> NotServingRegionException; Region is not online: -ROOT-,,0
> 2012-06-27 23:13:08,628 DEBUG 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> NotServingRegionException; Region is not online: -ROOT-,,0
> 2012-06-27 23:13:08,630 DEBUG 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> NotServingRegionException; Region is not online: -ROOT-,,0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to