[
https://issues.apache.org/jira/browse/HBASE-26568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh Elser resolved HBASE-26568.
--------------------------------
Resolution: Workaround
Resolving with "Workaround" being upgrade.
> hbase master got stuck after running couple of days in Azure setup
> ------------------------------------------------------------------
>
> Key: HBASE-26568
> URL: https://issues.apache.org/jira/browse/HBASE-26568
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.0.1
> Environment: Azure cloud
> Reporter: kaushik mandal
> Priority: Major
> Attachments: hbase-master-log-0.txt, hbase-master-log-1.txt
>
>
> hadoop hbase version 2.0.1
> hadoop hdfs version 2.7.7
>
> In Azure cluster setup, hbase master got hangs or not responding after
> running couple of days
> and the only way to recover hbase master is delete /hbase and restart. Bellow
> is the error getting in the hbase-master
>
> Error message
> ==============
> 2021-11-18 13:06:55,396 INFO
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16000]
> assignment.AssignProcedure: Retry=10 of max=10; pid=320, ppid=319,
> state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure table=hbase:meta,
> region=1588230740; rit=OPENING,
> location=nokiainfra-altiplano-hbase-regionserver-1.nokiainfra-altiplano-hbase-regionserver.default.svc.cluster.local,16020,1637238611975
> 2021-11-18 13:06:55,396 INFO [PEWorker-16] assignment.AssignProcedure:
> Retry=11 of max=10; pid=320, ppid=319,
> state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=hbase:meta,
> region=1588230740; rit=OFFLINE, location=null 2021-11-18 13:06:55,944 ERROR
> [PEWorker-16] procedure2.ProcedureExecutor: CODE-BUG: Uncaught runtime
> exception for pid=319, state=FAILED:RECOVER_META_ASSIGN_REGIONS,
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max
> attempts exceeded; RecoverMetaProcedure failedMetaServer=null, splitWal=true
> java.lang.UnsupportedOperationException: unhandled
> state=RECOVER_META_ASSIGN_REGIONS at
> org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:209)
> at
> org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:52)
> at
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
> at
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1372)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1328)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1197)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
> 2021-11-18 13:06:55,958 ERROR [PEWorker-16] procedure2.ProcedureExecutor:
> CODE-BUG: Uncaught runtime exception for pid=319,
> state=FAILED:RECOVER_META_ASSIGN_REGIONS,
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max
> attempts exceeded; RecoverMetaProcedure failedMetaServer=null, splitWal=true
> java.lang.UnsupportedOperationException: unhandled
> state=RECOVER_META_ASSIGN_REGIONS at
> org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:209)
> at
> org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:52)
> at
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
> at
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1372)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1328)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1197)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
> 2021-11-18 13:06:55,969 ERROR [PEWorker-16] procedure2.ProcedureExecutor:
> CODE-BUG: Uncaught runtime exception for pid=319,
> state=FAILED:RECOVER_META_ASSIGN_REGIONS,
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max
> attempts exceeded; RecoverMetaProcedure failedMetaServer=null, splitWal=true
> java.lang.UnsupportedOperationException: unhandled
> state=RECOVER_META_ASSIGN_REGIONS at
> org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:209)
> at
> org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:52)
> at
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
> at
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1372)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1328)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1197)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
> 2021-11-18 13:06:55,970 WARN [PEWorker-16] procedure2.ProcedureExecutor:
> Worker terminating UNNATURALLY null java.lang.ArrayIndexOutOfBoundsException:
> 2 at
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405)
> at
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178)
> at
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513)
> at
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505)
> at
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741)
> at
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691)
> at
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1406)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1328)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1197)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
> 2021-11-18 13:07:46,268 INFO
> [ReadOnlyZKClient-altiplano-zookeeper:2181@0x7e131580] zookeeper.ZooKeeper:
> Session: 0x200000efa5dfae6 closed
> ============================================================
>
> Error Message:
> ============================================================
> ==>
> /opt/hbase-2.0.1/logs/hbase--master-nokiainfra-altiplano-hbase-master-0.log
> <==
> 2021-12-02 12:43:51,351 INFO
> [RpcServer.default.FPBQ.Fifo.handler=129,queue=12,port=16000]
> master.ServerManager: Registering
> regionserver=nokiainfra-altiplano-hbase-regionserver-1.nokiainfra-altiplano-hbase-regionserver.default.svc.cluster.local,16020,1638449029563
> 2021-12-02 12:43:54,699 ERROR
> [RpcServer.default.FPBQ.Fifo.handler=129,queue=12,port=16000]
> master.MasterRpcServices: lock(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:372)
> at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper$6.addBlock(FanOutOneBlockAsyncDFSOutputHelper.java:380)
> at
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.createOutput(FanOutOneBlockAsyncDFSOutputHelper.java:774)
> ... 24 more
> 2021-12-02 12:43:54,746 INFO [main-EventThread] master.RegionServerTracker:
> RegionServer ephemeral node deleted, processing expiration
> [nokiainfra-altiplano-hbase-regionserver-1.nokiainfra-altiplano-hbase-regionserver.default.svc.cluster.local,16020,1638449029563]
> 2021-12-02 12:43:54,746 INFO [main-EventThread] master.ServerManager:
> Processing expiration of
> nokiainfra-altiplano-hbase-regionserver-1.nokiainfra-altiplano-hbase-regionserver.default.svc.cluster.local,16020,1638449029563
> on
> nokiainfra-altiplano-hbase-master-0.nokiainfra-altiplano-hbase-master.default.svc.cluster.local,16000,1638448730439
> 2021-12-02 12:43:54,860 INFO [PEWorker-10] procedure.ServerCrashProcedure:
> Start pid=10, state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure
> server=nokiainfra-altiplano-hbase-regionserver-1.nokiainfra-altiplano-hbase-regionserver.default.svc.cluster.local,16020,1638449029563,
> splitWal=true, meta=false
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)