[ 
https://issues.apache.org/jira/browse/HBASE-13194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357895#comment-14357895
 ] 

zhangduo commented on HBASE-13194:
----------------------------------

I see that for 'hbase:meta', we will try to verify whether the target region 
server is still alive, otherwise we will force a assign in 
finishActiveMasterInitialization.

But for namespace table, we do not have this check, and the code is here
{code:title=TableNamespaceManager.java}
  public void start() throws IOException {
    if (!MetaTableAccessor.tableExists(masterServices.getConnection(),
        TableName.NAMESPACE_TABLE_NAME)) {
      LOG.info("Namespace table not found. Creating...");
      createNamespaceTable(masterServices);
    }

    try {
      // Wait for the namespace table to be assigned.
      // If timed out, we will move ahead without initializing it.
      // So that it should be initialized later on lazily.
      long startTime = EnvironmentEdgeManager.currentTime();
      int timeout = conf.getInt(NS_INIT_TIMEOUT, DEFAULT_NS_INIT_TIMEOUT);
      while (!(isTableAssigned() && isTableEnabled())) {
        if (EnvironmentEdgeManager.currentTime() - startTime + 100 > timeout) {
          // We can't do anything if ns is not online.
          throw new IOException("Timedout " + timeout + "ms waiting for 
namespace table to " +
            "be assigned and enabled: " + getTableState());
        }
        Thread.sleep(100);
      }
    } catch (InterruptedException e) {
      throw (InterruptedIOException)new InterruptedIOException().initCause(e);
    }

    // initialize namespace table
    isTableAvailableAndInitialized();
  }
{code}
The comments say that we could do a lazy initialization. It is true that 
'isTableAvailableAndInitialized' has initialization code in it, the problem is 
NamespaceStateManager. NamespaceStateManager only call its initialize once in 
its start method, if it is failed this time, then there is no chance to do a 
'lazy initialization'.

To fix this test, I think use 'isTableAvailableAndInitialized' as the while 
loop's condition is enough, but it is still not 100% safe. Maybe we should also 
introduce lazy initialization in NamespaceStateManager?

Any suggestions? I do not know who are the experts, maybe [~jxiang] [~octo47] ? 
Thanks

> TableNamespaceManager not ready cause MasterQuotaManager initialization fail 
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-13194
>                 URL: https://issues.apache.org/jira/browse/HBASE-13194
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>            Reporter: zhangduo
>
> This cause TestNamespaceAuditor to fail.
> https://builds.apache.org/job/HBase-TRUNK/6237/testReport/junit/org.apache.hadoop.hbase.namespace/TestNamespaceAuditor/testRegionOperations/
> {noformat}
> 2015-03-10 22:42:01,372 ERROR [hemera:48616.activeMasterManager] 
> namespace.NamespaceStateManager(204): Error while update namespace state.
> java.io.IOException: Table Namespace Manager not ready yet, try again later
>       at 
> org.apache.hadoop.hbase.master.HMaster.checkNamespaceManagerReady(HMaster.java:1912)
>       at 
> org.apache.hadoop.hbase.master.HMaster.listNamespaceDescriptors(HMaster.java:2131)
>       at 
> org.apache.hadoop.hbase.namespace.NamespaceStateManager.initialize(NamespaceStateManager.java:188)
>       at 
> org.apache.hadoop.hbase.namespace.NamespaceStateManager.start(NamespaceStateManager.java:63)
>       at 
> org.apache.hadoop.hbase.namespace.NamespaceAuditor.start(NamespaceAuditor.java:57)
>       at 
> org.apache.hadoop.hbase.quotas.MasterQuotaManager.start(MasterQuotaManager.java:88)
>       at 
> org.apache.hadoop.hbase.master.HMaster.initQuotaManager(HMaster.java:902)
>       at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:756)
>       at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:161)
>       at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1455)
>       at java.lang.Thread.run(Thread.java:744)
> {noformat}
> The direct reason is that we do not have a retry here, if init fails then it 
> always fails. But I skimmed the code, seems there is no async init operations 
> when calling finishActiveMasterInitialization, so it is very strange. Need to 
> dig more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to