[ 
https://issues.apache.org/jira/browse/HBASE-25238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25238.
-----------------------------------
    Fix Version/s: 2.3.4
                   2.2.7
                   2.4.0
                   3.0.0-alpha-1
     Hadoop Flags: Reviewed
     Release Note: Fixes master procedure store migration issues going from 
2.0.x to 2.2.x and/or 2.3.x. Also fixes failed heartbeat parse during rolling 
upgrade from 2.0.x. to 2.3.x.
         Assignee: Michael Stack
       Resolution: Fixed

Merged to 2.2+ (half of the patch only went into 2.2 – full patch elsewhere). 
Thanks for review [~vjasani]

> Upgrading HBase from 2.2.0 to 2.3.x fails because of “Message missing 
> required fields: state”
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-25238
>                 URL: https://issues.apache.org/jira/browse/HBASE-25238
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: Zhuqi Jin
>            Assignee: Michael Stack
>            Priority: Critical
>             Fix For: 3.0.0-alpha-1, 2.4.0, 2.2.7, 2.3.4
>
>
> When we upgraded HBASE cluster from 2.0.0-RC0 to 2.3.0 or 2.3.3, the HMaster 
> on upgraded node failed to start.
> The error message is shown below: 
> {code:java}
> 2020-11-02 23:04:01,998 ERROR [master/2c4006997f99:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active 
> masterorg.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: state           at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:79)
>            at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:68)
>            at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:120)
>            at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:125)
>            at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:48)
>            at 
> org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:228)      
>      at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:124)
>            at 
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.deserializeStateData(RegionRemoteProcedureBase.java:352)
>            at 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure.deserializeStateData(OpenRegionProcedure.java:72)
>            at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:294)
>            at 
> org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
>            at 
> org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
>            at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore$1.load(RegionProcedureStore.java:194)
>            at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore$2.load(WALProcedureStore.java:474)
>            at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader.finish(ProcedureWALFormatReader.java:151)
>            at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.load(ProcedureWALFormat.java:103)
>            at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.load(WALProcedureStore.java:465)
>            at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.tryMigrate(RegionProcedureStore.java:184)
>            at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.recoverLease(RegionProcedureStore.java:257)
>            at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:587)
>            at 
> org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1572)
>            at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:950)
>            at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2240)
>            at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:622)         
>   at java.lang.Thread.run(Thread.java:748)2020-11-02 23:04:01,998 ERROR 
> [master/2c4006997f99:16000:becomeActiveMaster] master.HMaster: ***** ABORTING 
> master 2c4006997f99,16000,1604358237412: Unhandled exception. Starting 
> shutdown. 
> *****org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: state           at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:79)
>            at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:68)
>            at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:120)
>            at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:125)
>            at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:48)
>            at 
> org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:228)      
>      at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:124)
>            at 
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.deserializeStateData(RegionRemoteProcedureBase.java:352)
>            at 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure.deserializeStateData(OpenRegionProcedure.java:72)
>            at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:294)
>            at 
> org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
>            at 
> org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
>            at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore$1.load(RegionProcedureStore.java:194)
>            at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore$2.load(WALProcedureStore.java:474)
>            at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader.finish(ProcedureWALFormatReader.java:151)
>            at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.load(ProcedureWALFormat.java:103)
>            at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.load(WALProcedureStore.java:465)
>            at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.tryMigrate(RegionProcedureStore.java:184)
>            at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.recoverLease(RegionProcedureStore.java:257)
>            at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:587)
>            at 
> org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1572)
>            at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:950)
>            at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2240)
>            at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:622)         
>   at java.lang.Thread.run(Thread.java:748)2020-11-02 23:04:01,999 INFO  
> [master/2c4006997f99:16000:becomeActiveMaster] regionserver.HRegionServer: 
> ***** STOPPING region server '2c4006997f99,16000,1604358237412' 
> *****2020-11-02 23:04:01,999 INFO  
> [master/2c4006997f99:16000:becomeActiveMaster] regionserver.HRegionServer: 
> STOPPED: Stopped by master/2c4006997f99:16000:becomeActiveMaster2020-11-02 
> 23:04:02,814 INFO  [2c4006997f99:16000.splitLogManager..Chore.1] 
> hbase.ScheduledChore: Chore: SplitLogManager Timeout Monitor was 
> stopped2020-11-02 23:04:03,652 INFO  [master/2c4006997f99:16000] 
> ipc.NettyRpcServer: Stopping server on /252.17.1.2:160002020-11-02 
> 23:04:03,658 INFO  [master/2c4006997f99:16000] regionserver.HRegionServer: 
> Stopping infoServer2020-11-02 23:04:03,871 INFO  [master/2c4006997f99:16000] 
> handler.ContextHandler: Stopped 
> o.e.j.w.WebAppContext@6136998b{/,null,UNAVAILABLE}{file:/hbase/hbase-webapps/master}2020-11-02
>  23:04:03,877 INFO  [master/2c4006997f99:16000] server.AbstractConnector: 
> Stopped 
> ServerConnector@60d1b21f{HTTP/1.1,[http/1.1]}{0.0.0.0:16010}2020-11-02 
> 23:04:03,878 INFO  [master/2c4006997f99:16000] handler.ContextHandler: 
> Stopped 
> o.e.j.s.ServletContextHandler@aa004a0{/static,file:///hbase/hbase-webapps/static/,UNAVAILABLE}2020-11-02
>  23:04:03,878 INFO  [master/2c4006997f99:16000] handler.ContextHandler: 
> Stopped 
> o.e.j.s.ServletContextHandler@5965844d{/logs,file:///hbase/logs/,UNAVAILABLE}2020-11-02
>  23:04:03,888 INFO  [master/2c4006997f99:16000] regionserver.HRegionServer: 
> aborting server 2c4006997f99,16000,16043582374122020-11-02 23:04:03,889 INFO  
> [master/2c4006997f99:16000] regionserver.HRegionServer: stopping server 
> 2c4006997f99,16000,1604358237412; all regions closed.2020-11-02 23:04:03,889 
> INFO  [master/2c4006997f99:16000] hbase.ChoreService: Chore service for: 
> master/2c4006997f99:16000 had [] on shutdown2020-11-02 23:04:03,890 INFO  
> [master/2c4006997f99:16000] region.RegionProcedureStore: Stopping the Region 
> Procedure Store, isAbort=true2020-11-02 23:04:03,894 WARN  
> [master/2c4006997f99:16000] master.ActiveMasterManager: Failed get of master 
> address: java.io.IOException: Can't get master address from ZooKeeper; znode 
> data == null2020-11-02 23:04:03,894 INFO  [master/2c4006997f99:16000] 
> region.MasterRegion: Closing local region {ENCODED => 
> 1595e783b53d99cd5eef43b6debb2682, NAME => 
> 'master:store,,1.1595e783b53d99cd5eef43b6debb2682.', STARTKEY => '', ENDKEY 
> => ''}, isAbort=true2020-11-02 23:04:03,901 INFO  [master/2c4006997f99:16000] 
> regionserver.HRegion: Closing region 
> master:store,,1.1595e783b53d99cd5eef43b6debb2682.2020-11-02 23:04:03,903 INFO 
>  [master/2c4006997f99:16000] regionserver.HRegion: Closed 
> master:store,,1.1595e783b53d99cd5eef43b6debb2682.2020-11-02 23:04:03,903 INFO 
>  [master/2c4006997f99:16000] hbase.ChoreService: Chore service for: 
> 2c4006997f99:16000.splitLogManager. had [] on shutdown2020-11-02 23:04:03,904 
> INFO  [master:store-WAL-Roller] wal.AbstractWALRoller: LogRoller 
> exiting.2020-11-02 23:04:03,998 INFO  
> [ReadOnlyZKClient-252.17.1.5:2181@0x0432f3aa] zookeeper.ZooKeeper: Session: 
> 0x10138d353e5001b closed2020-11-02 23:04:03,998 INFO  
> [ReadOnlyZKClient-252.17.1.5:2181@0x0432f3aa-EventThread] 
> zookeeper.ClientCnxn: EventThread shut down for session: 
> 0x10138d353e5001b2020-11-02 23:04:04,098 INFO  [master/2c4006997f99:16000] 
> zookeeper.ZooKeeper: Session: 0x10138d353e50018 closed2020-11-02 23:04:04,098 
> INFO  [master/2c4006997f99:16000] regionserver.HRegionServer: Exiting; 
> stopping=2c4006997f99,16000,1604358237412; zookeeper connection 
> closed.2020-11-02 23:04:04,098 INFO  [main-EventThread] zookeeper.ClientCnxn: 
> EventThread shut down for session: 0x10138d353e500182020-11-02 23:04:04,099 
> ERROR [main] master.HMasterCommandLine: Master 
> exitingjava.lang.RuntimeException: HMaster Aborted           at 
> org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:244)
>            at 
> org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140)
>            at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)       
>     at 
> org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
>            at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:3072)
> {code}
>  
> It can be reproduced through the following steps:
>  # Start up a cluster of version 2.0.0-RC0 with 3 nodes
>  # Use hbase pe to write data. 
>  # 
> {code:java}
> /hbase/bin/hbase pe --nomapred --oneCon=true --valueSize=10 --rows=100 
> sequentialWrite 1
> {code}
>  # Stop the cluster:
>  ## using the graceful_stop.sh to stop all regionservers.
>  ## Then run stop-hbase.sh
>  #  Upgrade the node to branch-2.3
>  # After upgraded, as the log, hbase--master-2c4006997f99.log, suggested, 
> HMaster failed to start.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to