[ https://issues.apache.org/jira/browse/HBASE-25260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232548#comment-17232548 ]
Anoop Sam John commented on HBASE-25260: ---------------------------------------- What about the WAL system? Did u happen to delete/change the WAL FS between stop of 2.0.x cluster and start of new upgraded cluster? > upgrading hbase from 2.0.6 to 2.1.1, HMaster failed to become active because > it cannot find hbase:namespace table > ----------------------------------------------------------------------------------------------------------------- > > Key: HBASE-25260 > URL: https://issues.apache.org/jira/browse/HBASE-25260 > Project: HBase > Issue Type: Bug > Affects Versions: 2.1.1, 2.0.6 > Reporter: Yongle Zhang > Priority: Major > Attachments: hmaster.log > > > When we upgraded HBASE cluster from 2.0.6 to 2.1.1, the HMaster on upgraded > node failed to start. > Some stack trace in the error log: > {code:java} > 2020-11-06 02:01:26,420 WARN [PEWorker-12] > assignment.RegionTransitionProcedure: Failed transition, suspend 1secs > pid=12, ppid=9, state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; > AssignProcedure table=TestTable, region=37d62d2c1934da269a592e0e5cbca82a; > rit=OFFLINE, location=null; waiting on rectified condition fixed by other > Procedure or operator intervention > org.apache.hadoop.hbase.master.TableStateManager$TableStateNotFoundException: > TestTable > at > org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:215) > at > org.apache.hadoop.hbase.master.assignment.AssignProcedure.assign(AssignProcedure.java:194) > at > org.apache.hadoop.hbase.master.assignment.AssignProcedure.startTransition(AssignProcedure.java:205) > at > org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:355) > at > org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:97) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:957) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1835) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1595) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1200(ProcedureExecutor.java:80) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2140) > {code} > Seems it's caused by not able to find hbase:namespace table after upgrade: > {code:java} > 2020-11-06 02:01:26,791 ERROR [master/399fd6ca0c6d:16000:becomeActiveMaster] > master.HMaster: Master server abort: loaded coprocessors are: [] > 2020-11-06 02:01:26,791 ERROR [master/399fd6ca0c6d:16000:becomeActiveMaster] > master.HMaster: ***** ABORTING master 399fd6ca0c6d,16000,1604628075265: > Unhandled exception. Starting shutdown. ***** > java.lang.IllegalStateException: Expected the service > ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED > at > org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345) > at > org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291) > at > org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1253) > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1031) > at > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2254) > at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:583) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.hbase.TableNotFoundException: hbase:namespace > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:864) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:759) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:745) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:716) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.getRegionLocation(ConnectionImplementation.java:594) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.getRegionLocation(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:72) > at > org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:223) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105) > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:386) > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:360) > at > org.apache.hadoop.hbase.master.TableNamespaceManager.get(TableNamespaceManager.java:142) > at > org.apache.hadoop.hbase.master.TableNamespaceManager.isTableAvailableAndInitialized(TableNamespaceManager.java:279) > at > org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:104) > at > org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:63) > at > org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:226) > at > org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1251) > ... 4 more > {code} > Attached the error log file. > [^hmaster.log] > > Steps to reproduce: > # Start up a cluster of version 2.0.6 with 3 nodes > # Use hbase pe to write data. > {code:java} > /hbase/bin/hbase pe --nomapred --oneCon=true --valueSize=10 --rows=100 > sequentialWrite 1{code} > # Stop the cluster: > ## using the graceful_stop.sh to stop all regionservers. > ## Then run stop-hbase.sh > # Upgrade the node to 2.1.1 > 5. After upgrading HMaster failed to start. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)