[
https://issues.apache.org/jira/browse/HBASE-28815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882691#comment-17882691
]
Duo Zhang commented on HBASE-28815:
-----------------------------------
In general we should support upgrading to 2.6 directly, at least we do not want
to break it.
So I think we should fix this problem, instead of letting users upgrade to an
old minor version first and then to 2.6.x...
> Upgrade from 1.7.2 to 2.6.0 failed: HMaster aborted
> ---------------------------------------------------
>
> Key: HBASE-28815
> URL: https://issues.apache.org/jira/browse/HBASE-28815
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 2.6.0
> Reporter: Ke Han
> Priority: Major
>
> I am trying to migrate from 1.7.2 cluster to 2.6.0 (both are released
> versions). However, I observed that the hmaster crashed during the upgrade
> process.
> h1. Reproduce
> Step1: Start up 1.7.2 HBase cluster (1 HDFS, 1 HM, 1 RS).
> Step2: Stop the 1.7.2 HBase cluster.
> Step3: Upgrade to 2.6.0 HBase cluster.
> HMaster will crash with the following exception
> {code:java}
> 2024-09-04T16:04:47,004 WARN [PEWorker-2] procedure.InitMetaProcedure:
> Failed to init meta, suspend 1000secs
> java.io.IOException: Meta table is not partial, please sideline this meta
> directory or run HBCK to fix this meta table, e.g. rebuild the server
> hostname in ZNode for the meta region
> at
> org.apache.hadoop.hbase.master.procedure.InitMetaProcedure.deleteMetaTableDirectoryIfPartial(InitMetaProcedure.java:199)
> ~[hbase-server-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.master.procedure.InitMetaProcedure.writeFsLayout(InitMetaProcedure.java:78)
> ~[hbase-server-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.master.procedure.InitMetaProcedure.executeFromState(InitMetaProcedure.java:102)
> ~[hbase-server-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.master.procedure.InitMetaProcedure.executeFromState(InitMetaProcedure.java:54)
> ~[hbase-server-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)
> ~[hbase-procedure-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:944)
> ~[hbase-procedure-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1766)
> ~[hbase-procedure-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1444)
> ~[hbase-procedure-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:77)
> ~[hbase-procedure-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:2092)
> ~[hbase-procedure-2.6.0.jar:2.6.0]
> at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)
> ~[hbase-common-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2119)
> ~[hbase-procedure-2.6.0.jar:2.6.0]
> 2024-09-04T16:04:47,005 INFO [PEWorker-2] procedure2.TimeoutExecutorThread:
> ADDED pid=1, state=WAITING_TIMEOUT:INIT_META_WRITE_FS_LAYOUT, locked=true;
> InitMetaProcedure table=hbase:meta; timeout=1000, timestamp=1725465888005
> 2024-09-04T16:04:48,045 ERROR [PEWorker-1] procedure2.ProcedureExecutor: Root
> Procedure pid=1, state=FAILED:INIT_META_WRITE_FS_LAYOUT,
> exception=org.apache.hadoop.hbase.exceptions.TimeoutIOException via
> ProcedureExecutor:org.apache.hadoop.hbase.exceptions.TimeoutIOException:
> Operation timed out after 1.0010 sec; InitMetaProcedure table=hbase:meta does
> not support rollback but the execution failed and try to rollback, code bug?
> org.apache.hadoop.hbase.procedure2.RemoteProcedureException:
> org.apache.hadoop.hbase.exceptions.TimeoutIOException: Operation timed out
> after 1.0010 sec
> at
> org.apache.hadoop.hbase.procedure2.Procedure.setFailure(Procedure.java:768)
> ~[hbase-procedure-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.procedure2.Procedure.setTimeoutFailure(Procedure.java:797)
> ~[hbase-procedure-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.executeTimedoutProcedure(TimeoutExecutorThread.java:131)
> ~[hbase-procedure-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.execDelayedProcedure(TimeoutExecutorThread.java:109)
> ~[hbase-procedure-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.run(TimeoutExecutorThread.java:68)
> ~[hbase-procedure-2.6.0.jar:2.6.0]
> Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Operation
> timed out after 1.0010 sec
> at
> org.apache.hadoop.hbase.procedure2.Procedure.setTimeoutFailure(Procedure.java:798)
> ~[hbase-procedure-2.6.0.jar:2.6.0]
> ... 3 more
> 2024-09-04T16:04:48,058 INFO [PEWorker-1] procedure2.ProcedureExecutor:
> Rolled back pid=1, state=ROLLEDBACK,
> exception=org.apache.hadoop.hbase.exceptions.TimeoutIOException via
> ProcedureExecutor:org.apache.hadoop.hbase.exceptions.TimeoutIOException:
> Operation timed out after 1.0010 sec; InitMetaProcedure table=hbase:meta
> exec-time=1.4160 sec
> 2024-09-04T16:04:48,059 ERROR [master/hmaster:16000:becomeActiveMaster]
> master.HMaster: Failed to become active master
> java.io.IOException: Failed to initialize meta table
> at
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1077)
> ~[hbase-server-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2459)
> ~[hbase-server-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.master.HMaster.lambda$null$0(HMaster.java:590)
> ~[hbase-server-2.6.0.jar:2.6.0]
> at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:187)
> ~[hbase-common-2.6.0.jar:2.6.0]
> at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:177)
> ~[hbase-common-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.master.HMaster.lambda$run$1(HMaster.java:587)
> ~[hbase-server-2.6.0.jar:2.6.0]
> at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
> Caused by: org.apache.hadoop.hbase.procedure2.RemoteProcedureException:
> org.apache.hadoop.hbase.exceptions.TimeoutIOException: Operation timed out
> after 1.0010 sec
> at
> org.apache.hadoop.hbase.procedure2.Procedure.setFailure(Procedure.java:768)
> ~[hbase-procedure-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.procedure2.Procedure.setTimeoutFailure(Procedure.java:797)
> ~[hbase-procedure-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.executeTimedoutProcedure(TimeoutExecutorThread.java:131)
> ~[hbase-procedure-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.execDelayedProcedure(TimeoutExecutorThread.java:109)
> ~[hbase-procedure-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.run(TimeoutExecutorThread.java:68)
> ~[hbase-procedure-2.6.0.jar:2.6.0]
> Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Operation
> timed out after 1.0010 sec
> at
> org.apache.hadoop.hbase.procedure2.Procedure.setTimeoutFailure(Procedure.java:798)
> ~[hbase-procedure-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.executeTimedoutProcedure(TimeoutExecutorThread.java:131)
> ~[hbase-procedure-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.execDelayedProcedure(TimeoutExecutorThread.java:109)
> ~[hbase-procedure-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.run(TimeoutExecutorThread.java:68)
> ~[hbase-procedure-2.6.0.jar:2.6.0]
> 2024-09-04T16:04:48,060 ERROR [master/hmaster:16000:becomeActiveMaster]
> master.HMaster: Master server abort: loaded coprocessors are:
> [org.apache.hadoop.hbase.quotas.MasterQuotasObserver]
> 2024-09-04T16:04:48,060 ERROR [master/hmaster:16000:becomeActiveMaster]
> master.HMaster: ***** ABORTING master hmaster,16000,1725465872458: Unhandled
> exception. Starting shutdown. *****
> java.io.IOException: Failed to initialize meta table
> at
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1077)
> ~[hbase-server-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2459)
> ~[hbase-server-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.master.HMaster.lambda$null$0(HMaster.java:590)
> ~[hbase-server-2.6.0.jar:2.6.0]
> at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:187)
> ~[hbase-common-2.6.0.jar:2.6.0]
> at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:177)
> ~[hbase-common-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.master.HMaster.lambda$run$1(HMaster.java:587)
> ~[hbase-server-2.6.0.jar:2.6.0]
> at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
> Caused by: org.apache.hadoop.hbase.procedure2.RemoteProcedureException:
> org.apache.hadoop.hbase.exceptions.TimeoutIOException: Operation timed out
> after 1.0010 sec
> at
> org.apache.hadoop.hbase.procedure2.Procedure.setFailure(Procedure.java:768)
> ~[hbase-procedure-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.procedure2.Procedure.setTimeoutFailure(Procedure.java:797)
> ~[hbase-procedure-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.executeTimedoutProcedure(TimeoutExecutorThread.java:131)
> ~[hbase-procedure-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.execDelayedProcedure(TimeoutExecutorThread.java:109)
> ~[hbase-procedure-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.run(TimeoutExecutorThread.java:68)
> ~[hbase-procedure-2.6.0.jar:2.6.0]
> Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Operation
> timed out after 1.0010 sec
> at
> org.apache.hadoop.hbase.procedure2.Procedure.setTimeoutFailure(Procedure.java:798)
> ~[hbase-procedure-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.executeTimedoutProcedure(TimeoutExecutorThread.java:131)
> ~[hbase-procedure-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.execDelayedProcedure(TimeoutExecutorThread.java:109)
> ~[hbase-procedure-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.run(TimeoutExecutorThread.java:68)
> ~[hbase-procedure-2.6.0.jar:2.6.0]
> 2024-09-04T16:04:48,065 INFO [master/hmaster:16000:becomeActiveMaster]
> regionserver.HRegionServer: ***** STOPPING region server
> 'hmaster,16000,1725465872458' *****
> 2024-09-04T16:04:48,065 INFO [master/hmaster:16000:becomeActiveMaster]
> regionserver.HRegionServer: STOPPED: Stopped by
> master/hmaster:16000:becomeActiveMaster
> 2024-09-04T16:04:48,067 INFO [master/hmaster:16000]
> regionserver.HRegionServer: Stopping infoServer
> 2024-09-04T16:04:48,169 INFO [master/hmaster:16000] handler.ContextHandler:
> Stopped
> o.a.h.t.o.e.j.w.WebAppContext@78483404{master,/,null,STOPPED}{file:/hbase/hbase-2.6.0/hbase-webapps/master}
> 2024-09-04T16:04:48,180 INFO [master/hmaster:16000]
> server.AbstractConnector: Stopped ServerConnector@3a7ba368{HTTP/1.1,
> (http/1.1)}{0.0.0.0:16010}
> 2024-09-04T16:04:48,180 INFO [master/hmaster:16000] server.session: node0
> Stopped scavenging
> 2024-09-04T16:04:48,182 INFO [master/hmaster:16000] handler.ContextHandler:
> Stopped
> o.a.h.t.o.e.j.s.ServletContextHandler@b525e3a{static,/static,file:///hbase/hbase-2.6.0/hbase-webapps/static/,STOPPED}
> 2024-09-04T16:04:48,183 INFO [master/hmaster:16000] handler.ContextHandler:
> Stopped
> o.a.h.t.o.e.j.s.ServletContextHandler@294e4c35{logs,/logs,file:///var/log/hbase/,STOPPED}
> 2024-09-04T16:04:48,188 INFO [master/hmaster:16000]
> regionserver.HRegionServer: aborting server hmaster,16000,1725465872458
> 2024-09-04T16:04:48,190 INFO [master/hmaster:16000]
> regionserver.HRegionServer: stopping server hmaster,16000,1725465872458; all
> regions closed.
> 2024-09-04T16:04:48,191 WARN [OldWALsCleaner-1] cleaner.LogCleaner:
> Interrupted while cleaning old WALs, will try to clean it next round. Exiting.
> 2024-09-04T16:04:48,191 WARN [OldWALsCleaner-0] cleaner.LogCleaner:
> Interrupted while cleaning old WALs, will try to clean it next round. Exiting.
> 2024-09-04T16:04:48,194 INFO [master/hmaster:16000] hbase.ChoreService:
> Chore service for: master/hmaster:16000 had [] on shutdown
> 2024-09-04T16:04:48,196 INFO [master/hmaster:16000]
> procedure2.RemoteProcedureDispatcher: Stopping procedure remote dispatcher
> 2024-09-04T16:04:48,196 INFO [master/hmaster:16000]
> procedure2.ProcedureExecutor: Stopping
> 2024-09-04T16:04:48,205 INFO [master/hmaster:16000]
> region.RegionProcedureStore: Stopping the Region Procedure Store, isAbort=true
> 2024-09-04T16:04:48,208 WARN [master/hmaster:16000]
> master.ActiveMasterManager: Failed get of master address:
> java.io.IOException: Can't get master address from ZooKeeper; znode data ==
> null
> 2024-09-04T16:04:48,208 INFO [master/hmaster:16000]
> assignment.AssignmentManager: Stopping assignment manager
> 2024-09-04T16:04:48,209 INFO [master/hmaster:16000] region.MasterRegion:
> Closing local region {ENCODED => 1595e783b53d99cd5eef43b6debb2682, NAME =>
> 'master:store,,1.1595e783b53d99cd5eef43b6debb2682.', STARTKEY => '', ENDKEY
> => ''}, isAbort=true
> 2024-09-04T16:04:48,369 INFO [master/hmaster:16000] regionserver.HRegion:
> Closing region master:store,,1.1595e783b53d99cd5eef43b6debb2682.
> 2024-09-04T16:04:48,373 ERROR [master/hmaster:16000] regionserver.HRegion:
> Memstore data size is 2190 in region
> master:store,,1.1595e783b53d99cd5eef43b6debb2682.
> 2024-09-04T16:04:48,374 INFO [master/hmaster:16000] regionserver.HRegion:
> Closed master:store,,1.1595e783b53d99cd5eef43b6debb2682.
> 2024-09-04T16:04:48,374 INFO [master/hmaster:16000]
> flush.MasterFlushTableProcedureManager: stop: server shutting down.
> 2024-09-04T16:04:48,374 INFO [master:store-WAL-Roller]
> wal.AbstractWALRoller: LogRoller exiting.
> 2024-09-04T16:04:48,376 INFO [master/hmaster:16000] ipc.NettyRpcServer:
> Stopping server on /192.168.227.2:16000
> 2024-09-04T16:04:48,511 INFO [master/hmaster:16000] zookeeper.ZooKeeper:
> Session: 0x10009e5081e0000 closed
> 2024-09-04T16:04:48,511 INFO [main-EventThread] zookeeper.ClientCnxn:
> EventThread shut down for session: 0x10009e5081e0000
> 2024-09-04T16:04:48,511 INFO [master/hmaster:16000]
> regionserver.HRegionServer: Exiting; stopping=hmaster,16000,1725465872458;
> zookeeper connection closed.
> 2024-09-04T16:04:48,512 ERROR [main] master.HMasterCommandLine: Master exiting
> java.lang.RuntimeException: HMaster Aborted
> at
> org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:255)
> ~[hbase-server-2.6.0.jar:2.6.0]
> at
> org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:147)
> ~[hbase-server-2.6.0.jar:2.6.0]
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> ~[hadoop-common-2.10.2.jar:?]
> at
> org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:140)
> ~[hbase-server-2.6.0.jar:2.6.0] {code}
>
> Upgrade from 1.7.2 to 2.4.18 can still succeed.
> I am wondering whether this is a backward incompatibility or if there are any
> additional steps I missed to upgrade to 2.6.0.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)