[
https://issues.apache.org/jira/browse/HBASE-13194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356854#comment-14356854
]
zhangduo commented on HBASE-13194:
----------------------------------
Seems the problem is here.
{noformat}
2015-03-10 22:42:01,337 INFO [MASTER_SERVER_OPERATIONS-hemera:48616-0]
handler.ServerShutdownHandler(186): Mark regions in recovery for crashed server
hemera.apache.org,36185,1426027305449 before assignment; regions=[{ENCODED =>
969aa3ccca0a77c1d68f296b93b2d064, NAME =>
'hbase:namespace,,1426027307874.969aa3ccca0a77c1d68f296b93b2d064.', STARTKEY =>
'', ENDKEY => ''}]
2015-03-10 22:42:01,338 DEBUG [MASTER_SERVER_OPERATIONS-hemera:48616-0]
zookeeper.ZKUtil(745): master:48616-0x14c05d9d745000b, quorum=localhost:63193,
baseZNode=/hbase Unable to get data of znode
/hbase/recovering-regions/969aa3ccca0a77c1d68f296b93b2d064 because node does
not exist (not an error)
2015-03-10 22:42:01,351 INFO [hemera:48616.activeMasterManager]
master.AssignmentManager(416): Joined the cluster in 69ms, failover=true
2015-03-10 22:42:01,360 DEBUG [MASTER_SERVER_OPERATIONS-hemera:48616-0]
coordination.ZKSplitLogManagerCoordination(650): Marked
969aa3ccca0a77c1d68f296b93b2d064 as recovering from
hemera.apache.org,36185,1426027305449:
/hbase/recovering-regions/969aa3ccca0a77c1d68f296b93b2d064/hemera.apache.org,36185,1426027305449
2015-03-10 22:42:01,360 DEBUG [MASTER_SERVER_OPERATIONS-hemera:48616-0]
master.RegionStates(492): Adding to processed servers
hemera.apache.org,36185,1426027305449
2015-03-10 22:42:01,360 INFO [MASTER_SERVER_OPERATIONS-hemera:48616-0]
master.RegionStates(1074): Transition {969aa3ccca0a77c1d68f296b93b2d064
state=OPEN, ts=1426027321326, server=hemera.apache.org,36185,1426027305449} to
{969aa3ccca0a77c1d68f296b93b2d064 state=OFFLINE, ts=1426027321360,
server=hemera.apache.org,36185,1426027305449}
2015-03-10 22:42:01,361 INFO [MASTER_SERVER_OPERATIONS-hemera:48616-0]
master.RegionStateStore(207): Updating row
hbase:namespace,,1426027307874.969aa3ccca0a77c1d68f296b93b2d064. with
state=OFFLINE
2015-03-10 22:42:01,369 INFO [MASTER_SERVER_OPERATIONS-hemera:48616-0]
handler.ServerShutdownHandler(218): Reassigning 1 region(s) that
hemera.apache.org,36185,1426027305449 was carrying (and 0 regions(s) that were
opening on this server)
{noformat}
HMaster is also a RegionServer which carries system table regions. And when
restarting, seems the system region state is OPEN until we begin to recover it?
So we pass the check isTableAssigned check in TableNamespaceManager.start, but
the following calls to isTableAvailableAndInitialized are all failed because we
just begin to recover it and the region state is transited to OFFLINE.
Not sure why this happen, I think the state should not be OPEN when HMaster
started. Will go on tomorrow.
> TableNamespaceManager not ready cause MasterQuotaManager initialization fail
> -----------------------------------------------------------------------------
>
> Key: HBASE-13194
> URL: https://issues.apache.org/jira/browse/HBASE-13194
> Project: HBase
> Issue Type: Bug
> Components: master
> Reporter: zhangduo
>
> This cause TestNamespaceAuditor to fail.
> https://builds.apache.org/job/HBase-TRUNK/6237/testReport/junit/org.apache.hadoop.hbase.namespace/TestNamespaceAuditor/testRegionOperations/
> {noformat}
> 2015-03-10 22:42:01,372 ERROR [hemera:48616.activeMasterManager]
> namespace.NamespaceStateManager(204): Error while update namespace state.
> java.io.IOException: Table Namespace Manager not ready yet, try again later
> at
> org.apache.hadoop.hbase.master.HMaster.checkNamespaceManagerReady(HMaster.java:1912)
> at
> org.apache.hadoop.hbase.master.HMaster.listNamespaceDescriptors(HMaster.java:2131)
> at
> org.apache.hadoop.hbase.namespace.NamespaceStateManager.initialize(NamespaceStateManager.java:188)
> at
> org.apache.hadoop.hbase.namespace.NamespaceStateManager.start(NamespaceStateManager.java:63)
> at
> org.apache.hadoop.hbase.namespace.NamespaceAuditor.start(NamespaceAuditor.java:57)
> at
> org.apache.hadoop.hbase.quotas.MasterQuotaManager.start(MasterQuotaManager.java:88)
> at
> org.apache.hadoop.hbase.master.HMaster.initQuotaManager(HMaster.java:902)
> at
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:756)
> at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:161)
> at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1455)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}
> The direct reason is that we do not have a retry here, if init fails then it
> always fails. But I skimmed the code, seems there is no async init operations
> when calling finishActiveMasterInitialization, so it is very strange. Need to
> dig more.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)