[jira] [Commented] (HBASE-25239) Upgrading HBase from 2.2.0/2.3.3 to master(3.0.0) fails because HMaster “Failed to become active master”

Duo Zhang (Jira) Mon, 02 Nov 2020 23:07:48 -0800


    [ 
https://issues.apache.org/jira/browse/HBASE-25239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17225179#comment-17225179
 ]


Duo Zhang commented on HBASE-25239:
-----------------------------------

We will migrate the data in namespace table to meta table, so there should be 
live region servers to host namespace table and meta table.

> Upgrading HBase from 2.2.0/2.3.3 to master(3.0.0) fails because HMaster 
> “Failed to become active master”
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-25239
>                 URL: https://issues.apache.org/jira/browse/HBASE-25239
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.2.0, 2.3.3
>            Reporter: Zhuqi Jin
>            Priority: Major
>
> When we upgraded HBASE cluster from 2.2.0/2.3.3 to 
> master(c303f9d329d578d31140e507bdbcbe3aa097042b),  the HMaster on upgraded 
> node failed to start.
> The error message is shown below:
> {code:java}
> 2020-11-03 02:52:27,809 ERROR [master/65cddff041f6:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active 
> masterjava.lang.IllegalStateException: Expected the service 
> ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILEDat 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:379)at
>  
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:319)at
>  
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1362)at
>  
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1137)at
>  
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2245)at
>  org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:626)at 
> java.lang.Thread.run(Thread.java:748)Caused by: 
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 2 
> actions: RetriesExhaustedException: 2 times, servers with issues:at 
> org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.makeError(BufferedMutatorOverAsyncBufferedMutator.java:107)at
>  
> org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.internalFlush(BufferedMutatorOverAsyncBufferedMutator.java:122)at
>  
> org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.close(BufferedMutatorOverAsyncBufferedMutator.java:166)at
>  
> org.apache.hadoop.hbase.master.TableNamespaceManager.migrateNamespaceTable(TableNamespaceManager.java:93)at
>  
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:123)at
>  
> org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:61)at
>  
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:249)at
>  
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1360)...
>  4 more2020-11-03 02:52:27,810 ERROR 
> [master/65cddff041f6:16000:becomeActiveMaster] master.HMaster: Master server 
> abort: loaded coprocessors are: []2020-11-03 02:52:27,810 ERROR 
> [master/65cddff041f6:16000:becomeActiveMaster] master.HMaster: ***** ABORTING 
> master 65cddff041f6,16000,1604371935915: Unhandled exception. Starting 
> shutdown. *****java.lang.IllegalStateException: Expected the service 
> ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILEDat 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:379)at
>  
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:319)at
>  
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1362)at
>  
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1137)at
>  
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2245)at
>  org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:626)at 
> java.lang.Thread.run(Thread.java:748)Caused by: 
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 2 
> actions: RetriesExhaustedException: 2 times, servers with issues:at 
> org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.makeError(BufferedMutatorOverAsyncBufferedMutator.java:107)at
>  
> org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.internalFlush(BufferedMutatorOverAsyncBufferedMutator.java:122)at
>  
> org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.close(BufferedMutatorOverAsyncBufferedMutator.java:166)at
>  
> org.apache.hadoop.hbase.master.TableNamespaceManager.migrateNamespaceTable(TableNamespaceManager.java:93)at
>  
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:123)at
>  
> org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:61)at
>  
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:249)at
>  
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1360)...
>  4 more2020-11-03 02:52:27,810 INFO  
> [master/65cddff041f6:16000:becomeActiveMaster] regionserver.HRegionServer: 
> ***** STOPPING region server '65cddff041f6,16000,1604371935915' 
> *****2020-11-03 02:52:27,810 INFO  
> [master/65cddff041f6:16000:becomeActiveMaster] regionserver.HRegionServer: 
> STOPPED: Stopped by master/65cddff041f6:16000:becomeActiveMaster2020-11-03 
> 02:52:27,811 INFO  [master/65cddff041f6:16000] regionserver.HRegionServer: 
> Stopping infoServer2020-11-03 02:52:27,823 INFO  [master/65cddff041f6:16000] 
> handler.ContextHandler: Stopped 
> o.e.j.w.WebAppContext@47e4d9d0{/,null,UNAVAILABLE}{file:/hbase/hbase-webapps/master}2020-11-03
>  02:52:27,839 INFO  [master/65cddff041f6:16000] server.AbstractConnector: 
> Stopped 
> ServerConnector@2098d37d{HTTP/1.1,[http/1.1]}{0.0.0.0:16010}2020-11-03 
> 02:52:27,839 INFO  [master/65cddff041f6:16000] handler.ContextHandler: 
> Stopped 
> o.e.j.s.ServletContextHandler@303a5119{/static,file:///hbase/hbase-webapps/static/,UNAVAILABLE}2020-11-03
>  02:52:27,839 INFO  [master/65cddff041f6:16000] handler.ContextHandler: 
> Stopped 
> o.e.j.s.ServletContextHandler@38548b19{/logs,file:///hbase/logs/,UNAVAILABLE}2020-11-03
>  02:52:27,844 INFO  [master/65cddff041f6:16000] regionserver.HRegionServer: 
> aborting server 65cddff041f6,16000,16043719359152020-11-03 02:52:27,850 INFO  
> [master/65cddff041f6:16000] regionserver.HRegionServer: stopping server 
> 65cddff041f6,16000,1604371935915; all regions closed.2020-11-03 02:52:27,851 
> INFO  [master/65cddff041f6:16000] hbase.ChoreService: Chore service for: 
> master/65cddff041f6:16000 had [ScheduledChore name=FlushedSequenceIdFlusher, 
> period=10800000, unit=MILLISECONDS] on shutdown2020-11-03 02:52:27,857 INFO  
> [master/65cddff041f6:16000] master.ServerManager: Writing .lastflushedseqids 
> file at: file:/var/lib/hbase/.lastflushedseqids2020-11-03 02:52:27,874 INFO  
> [master/65cddff041f6:16000] assignment.AssignmentManager: Stopping assignment 
> manager2020-11-03 02:52:27,875 INFO  [master/65cddff041f6:16000] 
> procedure2.RemoteProcedureDispatcher: Stopping procedure remote 
> dispatcher2020-11-03 02:52:27,877 INFO  [master/65cddff041f6:16000] 
> procedure2.ProcedureExecutor: Stopping2020-11-03 02:52:27,882 INFO  
> [master/65cddff041f6:16000] region.RegionProcedureStore: Stopping the Region 
> Procedure Store, isAbort=true2020-11-03 02:52:27,883 INFO  
> [master/65cddff041f6:16000] store.LocalRegion: Closing local region {ENCODED 
> => 1595e783b53d99cd5eef43b6debb2682, NAME => 
> 'master:store,,1.1595e783b53d99cd5eef43b6debb2682.', STARTKEY => '', ENDKEY 
> => ''}, isAbort=true2020-11-03 02:52:27,888 INFO  [master/65cddff041f6:16000] 
> regionserver.HRegion: Closing region 
> master:store,,1.1595e783b53d99cd5eef43b6debb2682.2020-11-03 02:52:27,889 
> ERROR [master/65cddff041f6:16000] regionserver.HRegion: Memstore data size is 
> 26957 in region master:store,,1.1595e783b53d99cd5eef43b6debb2682.2020-11-03 
> 02:52:27,890 INFO  [master/65cddff041f6:16000] regionserver.HRegion: Closed 
> master:store,,1.1595e783b53d99cd5eef43b6debb2682.2020-11-03 02:52:27,890 INFO 
>  [master/65cddff041f6:16000] hbase.ChoreService: Chore service for: 
> master/65cddff041f6:16000.splitLogManager. had [ScheduledChore 
> name=SplitLogManager Timeout Monitor, period=1000, unit=MILLISECONDS] on 
> shutdown2020-11-03 02:52:27,890 INFO  [master:store-WAL-Roller] 
> wal.AbstractWALRoller: LogRoller exiting.2020-11-03 02:52:27,892 INFO  
> [master/65cddff041f6:16000] flush.MasterFlushTableProcedureManager: stop: 
> server shutting down.2020-11-03 02:52:27,894 INFO  
> [master/65cddff041f6:16000] ipc.NettyRpcServer: Stopping server on 
> /252.17.1.2:160002020-11-03 02:52:28,058 INFO  
> [ReadOnlyZKClient-252.17.1.5:2181@0x3e505444] zookeeper.ZooKeeper: Session: 
> 0x10139a450ef001b closed2020-11-03 02:52:28,058 INFO  
> [ReadOnlyZKClient-252.17.1.5:2181@0x3e505444-EventThread] 
> zookeeper.ClientCnxn: EventThread shut down for session: 
> 0x10139a450ef001b2020-11-03 02:52:28,166 INFO  [master/65cddff041f6:16000] 
> zookeeper.ZooKeeper: Session: 0x10139a450ef0018 closed2020-11-03 02:52:28,166 
> INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down for 
> session: 0x10139a450ef00182020-11-03 02:52:28,166 INFO  
> [master/65cddff041f6:16000] regionserver.HRegionServer: Exiting; 
> stopping=65cddff041f6,16000,1604371935915; zookeeper connection 
> closed.2020-11-03 02:52:28,168 ERROR [main] master.HMasterCommandLine: Master 
> exitingjava.lang.RuntimeException: HMaster Abortedat 
> org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:244)at
>  
> org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140)at
>  org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)at 
> org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)at
>  org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:3077)
> {code}
> It can be reproduced through the following steps: 
>  # Start up a cluster of version 2.2.0 (rel/2.2.0)/2.3.3(branch-2.3) with 3 
> nodes 
>  # Use hbase pe to write data.
> {code:java}
>  /hbase/bin/hbase pe --nomapred --oneCon=true --valueSize=10 --rows=100 
> sequentialWrite{code}
>  # Stop the cluster:  
>  ## Using the graceful_stop.sh to stop all regionservers.
>  ## Then run stop-hbase.sh
>  # Upgrade the node to master(c303f9d329d578d31140e507bdbcbe3aa097042b)
>  # After upgrading, as the log, hbase--master-eca51d951598.log, suggested, 
> HMaster failed to start.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-25239) Upgrading HBase from 2.2.0/2.3.3 to master(3.0.0) fails because HMaster “Failed to become active master”

Reply via email to