Pankaj Kumar created HBASE-16805:
------------------------------------

             Summary: HMaster may send reportForDuty himself while shutting down
                 Key: HBASE-16805
                 URL: https://issues.apache.org/jira/browse/HBASE-16805
             Project: HBase
          Issue Type: Bug
          Components: master
            Reporter: Pankaj Kumar
            Assignee: Pankaj Kumar
            Priority: Minor


We met an interesting scenario where HMaster had sent reportForDuty to himself 
during shutting down. 

Initially HMaster had registered himself as active master, but couldn't finish 
its initialization as Namespace table was not assigned due to some reason 
within the specified time,
{noformat}
2016-07-30 19:36:52,161 | FATAL | hadoopc1h2:21300.activeMasterManager | Failed 
to become active master | 
org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1610)
java.io.IOException: Timedout 300000ms waiting for namespace table to be 
assigned
        at 
org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:102)
        at 
org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:977)
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:763)
        at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:171)
        at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1606)
        at java.lang.Thread.run(Thread.java:745)
2016-07-30 19:36:52,162 | FATAL | hadoopc1h2:21300.activeMasterManager | Master 
server abort: loaded coprocessors are: 
[org.apache.hadoop.hbase.security.access.AccessController, 
org.apache.hadoop.hbase.index.coprocessor.master.IndexMasterObserver, 
org.apache.hadoop.hbase.JMXListener] | 
org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1981)
2016-07-30 19:36:52,162 | FATAL | hadoopc1h2:21300.activeMasterManager | 
Unhandled exception. Starting shutdown. | 
org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1984)
java.io.IOException: Timedout 300000ms waiting for namespace table to be 
assigned
        at 
org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:102)
        at 
org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:977)
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:763)
        at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:171)
        at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1606)
        at java.lang.Thread.run(Thread.java:745)
2016-07-30 19:36:52,187 | INFO  | master/hadoopc1h2/172.16.19.51:21300 | 
reportForDuty to master=hadoopc1h2,21300,1469877905979 with port=21300, 
startcode=1469877905979 | 
org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2271)
2016-07-30 19:36:52,198 | INFO  | hadoopc1h2:21300.activeMasterManager | 
ConnectorServer stopped! | 
org.apache.hadoop.hbase.JMXListener.stopConnectorServer(JMXListener.java:160)
{noformat}
Above in the second last line, HMaster sent reportForDuty to himself.


Background:
1) During master startup HMasterCommandLine constructs the HMaster which starts 
another thread which is waiting to become active,
{code}
        startActiveMasterManager(infoPort);
{code}
 
2) Same time after constructing HMaster, HMasterCommandLine started the HMaster 
thread, 
{code}
         HMaster master = HMaster.constructMaster(masterClass, conf, csm);
        if (master.isStopped()) {
          LOG.info("Won't bring the Master up as a shutdown is requested");
          return 1;
        }
        master.start();
        master.join();
{code}
which will be waiting at below code flow,
{noformat}
        HRegionServer
                run()
                   preRegistrationInitialization()
                      initializeZooKeeper()
                        waitForMasterActive()
{noformat}

3) In HMaster,
{code}
  protected void waitForMasterActive(){
    boolean tablesOnMaster = BaseLoadBalancer.tablesOnMaster(conf);
    while (!(tablesOnMaster && isActiveMaster)
        && !isStopped() && !isAborted()) {
      sleeper.sleep();
    }
  }
{code}
HMaster will wait here until it is stopped/aborted as 
"hbase.balancer.tablesOnMaster" is not configured.


When HMaster failed to complete its initialization (as Namespace table was not 
assigned) then it will be abort,
{noformat}
        abort("Unhandled exception. Starting shutdown.", t);
{noformat}

So step-2 thread will not wait anymore on HMaster abort and while processing 
further it will send send report to active master.
{code}
      // Try and register with the Master; tell it we are here.  Break if
      // server is stopped or the clusterup flag is down or hdfs went wacky.
      while (keepLooping()) {
        RegionServerStartupResponse w = reportForDuty();
        if (w == null) {
          LOG.warn("reportForDuty failed; sleeping and then retrying.");
          this.sleeper.sleep();
        } else {
          handleReportForDutyResponse(w);
          break;
        }
      }
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to