[ 
https://issues.apache.org/jira/browse/HBASE-25260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232548#comment-17232548
 ] 

Anoop Sam John commented on HBASE-25260:
----------------------------------------

What about the WAL system? Did u happen to delete/change the WAL FS between 
stop of 2.0.x cluster and start of new upgraded cluster?

> upgrading hbase from 2.0.6 to 2.1.1, HMaster failed to become active because 
> it cannot find hbase:namespace table
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-25260
>                 URL: https://issues.apache.org/jira/browse/HBASE-25260
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.1.1, 2.0.6
>            Reporter: Yongle Zhang
>            Priority: Major
>         Attachments: hmaster.log
>
>
> When we upgraded HBASE cluster from 2.0.6 to 2.1.1, the HMaster on upgraded 
> node failed to start.
> Some stack trace in the error log:
> {code:java}
> 2020-11-06 02:01:26,420 WARN  [PEWorker-12] 
> assignment.RegionTransitionProcedure: Failed transition, suspend 1secs 
> pid=12, ppid=9, state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; 
> AssignProcedure table=TestTable, region=37d62d2c1934da269a592e0e5cbca82a; 
> rit=OFFLINE, location=null; waiting on rectified condition fixed by other 
> Procedure or operator intervention
> org.apache.hadoop.hbase.master.TableStateManager$TableStateNotFoundException: 
> TestTable
>   at 
> org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:215)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignProcedure.assign(AssignProcedure.java:194)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignProcedure.startTransition(AssignProcedure.java:205)
>   at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:355)
>   at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:97)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:957)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1835)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1595)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1200(ProcedureExecutor.java:80)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2140)
> {code}
> Seems it's caused by not able to find hbase:namespace table after upgrade: 
> {code:java}
> 2020-11-06 02:01:26,791 ERROR [master/399fd6ca0c6d:16000:becomeActiveMaster] 
> master.HMaster: Master server abort: loaded coprocessors are: []
> 2020-11-06 02:01:26,791 ERROR [master/399fd6ca0c6d:16000:becomeActiveMaster] 
> master.HMaster: ***** ABORTING master 399fd6ca0c6d,16000,1604628075265: 
> Unhandled exception. Starting shutdown. *****
> java.lang.IllegalStateException: Expected the service 
> ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED
>   at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345)
>   at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291)
>   at 
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1253)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1031)
>   at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2254)
>   at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:583)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hbase.TableNotFoundException: hbase:namespace
>   at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:864)
>   at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:759)
>   at 
> org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131)
>   at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:745)
>   at 
> org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131)
>   at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:716)
>   at 
> org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131)
>   at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.getRegionLocation(ConnectionImplementation.java:594)
>   at 
> org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.getRegionLocation(ConnectionUtils.java:131)
>   at 
> org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:72)
>   at 
> org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:223)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105)
>   at org.apache.hadoop.hbase.client.HTable.get(HTable.java:386)
>   at org.apache.hadoop.hbase.client.HTable.get(HTable.java:360)
>   at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.get(TableNamespaceManager.java:142)
>   at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.isTableAvailableAndInitialized(TableNamespaceManager.java:279)
>   at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:104)
>   at 
> org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:63)
>   at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:226)
>   at 
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1251)
>   ... 4 more
> {code}
> Attached the error log file. 
> [^hmaster.log]
>  
> Steps to reproduce: 
>  # Start up a cluster of version 2.0.6 with 3 nodes
>  # Use hbase pe to write data. 
> {code:java}
> /hbase/bin/hbase pe --nomapred --oneCon=true --valueSize=10 --rows=100 
> sequentialWrite 1{code}
>  # Stop the cluster:  
>  ## using the graceful_stop.sh to stop all regionservers.
>  ## Then run stop-hbase.sh
>  # Upgrade the node to 2.1.1
> 5. After upgrading HMaster failed to start.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to