[ 
https://issues.apache.org/jira/browse/HBASE-28815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882691#comment-17882691
 ] 

Duo Zhang commented on HBASE-28815:
-----------------------------------

In general we should support upgrading to 2.6 directly, at least we do not want 
to break it.

So I think we should fix this problem, instead of letting users upgrade to an 
old minor version first and then to 2.6.x...

> Upgrade from 1.7.2 to 2.6.0 failed: HMaster aborted
> ---------------------------------------------------
>
>                 Key: HBASE-28815
>                 URL: https://issues.apache.org/jira/browse/HBASE-28815
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 2.6.0
>            Reporter: Ke Han
>            Priority: Major
>
> I am trying to migrate from 1.7.2 cluster to 2.6.0 (both are released 
> versions). However, I observed that the hmaster crashed during the upgrade 
> process.
> h1. Reproduce
> Step1: Start up 1.7.2 HBase cluster (1 HDFS, 1 HM, 1 RS).
> Step2: Stop the 1.7.2 HBase cluster.
> Step3: Upgrade to 2.6.0 HBase cluster.
> HMaster will crash with the following exception
> {code:java}
> 2024-09-04T16:04:47,004 WARN  [PEWorker-2] procedure.InitMetaProcedure: 
> Failed to init meta, suspend 1000secs
> java.io.IOException: Meta table is not partial, please sideline this meta 
> directory or run HBCK to fix this meta table, e.g. rebuild the server 
> hostname in ZNode for the meta region
>         at 
> org.apache.hadoop.hbase.master.procedure.InitMetaProcedure.deleteMetaTableDirectoryIfPartial(InitMetaProcedure.java:199)
>  ~[hbase-server-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.master.procedure.InitMetaProcedure.writeFsLayout(InitMetaProcedure.java:78)
>  ~[hbase-server-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.master.procedure.InitMetaProcedure.executeFromState(InitMetaProcedure.java:102)
>  ~[hbase-server-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.master.procedure.InitMetaProcedure.executeFromState(InitMetaProcedure.java:54)
>  ~[hbase-server-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:944) 
> ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1766)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1444)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:77)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:2092)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216) 
> ~[hbase-common-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2119)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
> 2024-09-04T16:04:47,005 INFO  [PEWorker-2] procedure2.TimeoutExecutorThread: 
> ADDED pid=1, state=WAITING_TIMEOUT:INIT_META_WRITE_FS_LAYOUT, locked=true; 
> InitMetaProcedure table=hbase:meta; timeout=1000, timestamp=1725465888005
> 2024-09-04T16:04:48,045 ERROR [PEWorker-1] procedure2.ProcedureExecutor: Root 
> Procedure pid=1, state=FAILED:INIT_META_WRITE_FS_LAYOUT, 
> exception=org.apache.hadoop.hbase.exceptions.TimeoutIOException via 
> ProcedureExecutor:org.apache.hadoop.hbase.exceptions.TimeoutIOException: 
> Operation timed out after 1.0010 sec; InitMetaProcedure table=hbase:meta does 
> not support rollback but the execution failed and try to rollback, code bug?
> org.apache.hadoop.hbase.procedure2.RemoteProcedureException: 
> org.apache.hadoop.hbase.exceptions.TimeoutIOException: Operation timed out 
> after 1.0010 sec
>         at 
> org.apache.hadoop.hbase.procedure2.Procedure.setFailure(Procedure.java:768) 
> ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.Procedure.setTimeoutFailure(Procedure.java:797)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.executeTimedoutProcedure(TimeoutExecutorThread.java:131)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.execDelayedProcedure(TimeoutExecutorThread.java:109)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.run(TimeoutExecutorThread.java:68)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
> Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Operation 
> timed out after 1.0010 sec
>         at 
> org.apache.hadoop.hbase.procedure2.Procedure.setTimeoutFailure(Procedure.java:798)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
>         ... 3 more
> 2024-09-04T16:04:48,058 INFO  [PEWorker-1] procedure2.ProcedureExecutor: 
> Rolled back pid=1, state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.exceptions.TimeoutIOException via 
> ProcedureExecutor:org.apache.hadoop.hbase.exceptions.TimeoutIOException: 
> Operation timed out after 1.0010 sec; InitMetaProcedure table=hbase:meta 
> exec-time=1.4160 sec
> 2024-09-04T16:04:48,059 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active master
> java.io.IOException: Failed to initialize meta table
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1077)
>  ~[hbase-server-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2459)
>  ~[hbase-server-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.master.HMaster.lambda$null$0(HMaster.java:590) 
> ~[hbase-server-2.6.0.jar:2.6.0]
>         at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:187) 
> ~[hbase-common-2.6.0.jar:2.6.0]
>         at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:177) 
> ~[hbase-common-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$1(HMaster.java:587) 
> ~[hbase-server-2.6.0.jar:2.6.0]
>         at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
> Caused by: org.apache.hadoop.hbase.procedure2.RemoteProcedureException: 
> org.apache.hadoop.hbase.exceptions.TimeoutIOException: Operation timed out 
> after 1.0010 sec
>         at 
> org.apache.hadoop.hbase.procedure2.Procedure.setFailure(Procedure.java:768) 
> ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.Procedure.setTimeoutFailure(Procedure.java:797)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.executeTimedoutProcedure(TimeoutExecutorThread.java:131)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.execDelayedProcedure(TimeoutExecutorThread.java:109)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.run(TimeoutExecutorThread.java:68)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
> Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Operation 
> timed out after 1.0010 sec
>         at 
> org.apache.hadoop.hbase.procedure2.Procedure.setTimeoutFailure(Procedure.java:798)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.executeTimedoutProcedure(TimeoutExecutorThread.java:131)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.execDelayedProcedure(TimeoutExecutorThread.java:109)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.run(TimeoutExecutorThread.java:68)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
> 2024-09-04T16:04:48,060 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: Master server abort: loaded coprocessors are: 
> [org.apache.hadoop.hbase.quotas.MasterQuotasObserver]
> 2024-09-04T16:04:48,060 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: ***** ABORTING master hmaster,16000,1725465872458: Unhandled 
> exception. Starting shutdown. *****
> java.io.IOException: Failed to initialize meta table
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1077)
>  ~[hbase-server-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2459)
>  ~[hbase-server-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.master.HMaster.lambda$null$0(HMaster.java:590) 
> ~[hbase-server-2.6.0.jar:2.6.0]
>         at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:187) 
> ~[hbase-common-2.6.0.jar:2.6.0]
>         at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:177) 
> ~[hbase-common-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$1(HMaster.java:587) 
> ~[hbase-server-2.6.0.jar:2.6.0]
>         at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
> Caused by: org.apache.hadoop.hbase.procedure2.RemoteProcedureException: 
> org.apache.hadoop.hbase.exceptions.TimeoutIOException: Operation timed out 
> after 1.0010 sec
>         at 
> org.apache.hadoop.hbase.procedure2.Procedure.setFailure(Procedure.java:768) 
> ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.Procedure.setTimeoutFailure(Procedure.java:797)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.executeTimedoutProcedure(TimeoutExecutorThread.java:131)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.execDelayedProcedure(TimeoutExecutorThread.java:109)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.run(TimeoutExecutorThread.java:68)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
> Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Operation 
> timed out after 1.0010 sec
>         at 
> org.apache.hadoop.hbase.procedure2.Procedure.setTimeoutFailure(Procedure.java:798)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.executeTimedoutProcedure(TimeoutExecutorThread.java:131)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.execDelayedProcedure(TimeoutExecutorThread.java:109)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.run(TimeoutExecutorThread.java:68)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
> 2024-09-04T16:04:48,065 INFO  [master/hmaster:16000:becomeActiveMaster] 
> regionserver.HRegionServer: ***** STOPPING region server 
> 'hmaster,16000,1725465872458' *****
> 2024-09-04T16:04:48,065 INFO  [master/hmaster:16000:becomeActiveMaster] 
> regionserver.HRegionServer: STOPPED: Stopped by 
> master/hmaster:16000:becomeActiveMaster
> 2024-09-04T16:04:48,067 INFO  [master/hmaster:16000] 
> regionserver.HRegionServer: Stopping infoServer
> 2024-09-04T16:04:48,169 INFO  [master/hmaster:16000] handler.ContextHandler: 
> Stopped 
> o.a.h.t.o.e.j.w.WebAppContext@78483404{master,/,null,STOPPED}{file:/hbase/hbase-2.6.0/hbase-webapps/master}
> 2024-09-04T16:04:48,180 INFO  [master/hmaster:16000] 
> server.AbstractConnector: Stopped ServerConnector@3a7ba368{HTTP/1.1, 
> (http/1.1)}{0.0.0.0:16010}
> 2024-09-04T16:04:48,180 INFO  [master/hmaster:16000] server.session: node0 
> Stopped scavenging
> 2024-09-04T16:04:48,182 INFO  [master/hmaster:16000] handler.ContextHandler: 
> Stopped 
> o.a.h.t.o.e.j.s.ServletContextHandler@b525e3a{static,/static,file:///hbase/hbase-2.6.0/hbase-webapps/static/,STOPPED}
> 2024-09-04T16:04:48,183 INFO  [master/hmaster:16000] handler.ContextHandler: 
> Stopped 
> o.a.h.t.o.e.j.s.ServletContextHandler@294e4c35{logs,/logs,file:///var/log/hbase/,STOPPED}
> 2024-09-04T16:04:48,188 INFO  [master/hmaster:16000] 
> regionserver.HRegionServer: aborting server hmaster,16000,1725465872458
> 2024-09-04T16:04:48,190 INFO  [master/hmaster:16000] 
> regionserver.HRegionServer: stopping server hmaster,16000,1725465872458; all 
> regions closed.
> 2024-09-04T16:04:48,191 WARN  [OldWALsCleaner-1] cleaner.LogCleaner: 
> Interrupted while cleaning old WALs, will try to clean it next round. Exiting.
> 2024-09-04T16:04:48,191 WARN  [OldWALsCleaner-0] cleaner.LogCleaner: 
> Interrupted while cleaning old WALs, will try to clean it next round. Exiting.
> 2024-09-04T16:04:48,194 INFO  [master/hmaster:16000] hbase.ChoreService: 
> Chore service for: master/hmaster:16000 had [] on shutdown
> 2024-09-04T16:04:48,196 INFO  [master/hmaster:16000] 
> procedure2.RemoteProcedureDispatcher: Stopping procedure remote dispatcher
> 2024-09-04T16:04:48,196 INFO  [master/hmaster:16000] 
> procedure2.ProcedureExecutor: Stopping
> 2024-09-04T16:04:48,205 INFO  [master/hmaster:16000] 
> region.RegionProcedureStore: Stopping the Region Procedure Store, isAbort=true
> 2024-09-04T16:04:48,208 WARN  [master/hmaster:16000] 
> master.ActiveMasterManager: Failed get of master address: 
> java.io.IOException: Can't get master address from ZooKeeper; znode data == 
> null
> 2024-09-04T16:04:48,208 INFO  [master/hmaster:16000] 
> assignment.AssignmentManager: Stopping assignment manager
> 2024-09-04T16:04:48,209 INFO  [master/hmaster:16000] region.MasterRegion: 
> Closing local region {ENCODED => 1595e783b53d99cd5eef43b6debb2682, NAME => 
> 'master:store,,1.1595e783b53d99cd5eef43b6debb2682.', STARTKEY => '', ENDKEY 
> => ''}, isAbort=true
> 2024-09-04T16:04:48,369 INFO  [master/hmaster:16000] regionserver.HRegion: 
> Closing region master:store,,1.1595e783b53d99cd5eef43b6debb2682.
> 2024-09-04T16:04:48,373 ERROR [master/hmaster:16000] regionserver.HRegion: 
> Memstore data size is 2190 in region 
> master:store,,1.1595e783b53d99cd5eef43b6debb2682.
> 2024-09-04T16:04:48,374 INFO  [master/hmaster:16000] regionserver.HRegion: 
> Closed master:store,,1.1595e783b53d99cd5eef43b6debb2682.
> 2024-09-04T16:04:48,374 INFO  [master/hmaster:16000] 
> flush.MasterFlushTableProcedureManager: stop: server shutting down.
> 2024-09-04T16:04:48,374 INFO  [master:store-WAL-Roller] 
> wal.AbstractWALRoller: LogRoller exiting.
> 2024-09-04T16:04:48,376 INFO  [master/hmaster:16000] ipc.NettyRpcServer: 
> Stopping server on /192.168.227.2:16000
> 2024-09-04T16:04:48,511 INFO  [master/hmaster:16000] zookeeper.ZooKeeper: 
> Session: 0x10009e5081e0000 closed
> 2024-09-04T16:04:48,511 INFO  [main-EventThread] zookeeper.ClientCnxn: 
> EventThread shut down for session: 0x10009e5081e0000
> 2024-09-04T16:04:48,511 INFO  [master/hmaster:16000] 
> regionserver.HRegionServer: Exiting; stopping=hmaster,16000,1725465872458; 
> zookeeper connection closed.
> 2024-09-04T16:04:48,512 ERROR [main] master.HMasterCommandLine: Master exiting
> java.lang.RuntimeException: HMaster Aborted
>         at 
> org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:255)
>  ~[hbase-server-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:147)
>  ~[hbase-server-2.6.0.jar:2.6.0]
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) 
> ~[hadoop-common-2.10.2.jar:?]
>         at 
> org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:140)
>  ~[hbase-server-2.6.0.jar:2.6.0] {code}
>  
> Upgrade from 1.7.2 to 2.4.18 can still succeed.
> I am wondering whether this is a backward incompatibility or if there are any 
> additional steps I missed to upgrade to 2.6.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to