kaushik mandal created HBASE-26568:
--------------------------------------
Summary: hbase master got stuck after running couple of days in
Azure setup
Key: HBASE-26568
URL: https://issues.apache.org/jira/browse/HBASE-26568
Project: HBase
Issue Type: Bug
Components: hbase-thirdparty
Environment: Azure cloud
Reporter: kaushik mandal
Attachments: hbase-master-log-0.txt, hbase-master-log-1.txt
hadoop hbase version 2.0.1
hadoop hdfs version 2.7.7
In Azure cluster setup, hbase master got hangs or not responding after running
couple of days
and the only way to recover hbase master is delete /hbase and restart. Bellow
is the error getting in the hbase-master
Error message
==============
2021-11-18 13:06:55,396 INFO
[RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16000]
assignment.AssignProcedure: Retry=10 of max=10; pid=320, ppid=319,
state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure table=hbase:meta,
region=1588230740; rit=OPENING,
location=nokiainfra-altiplano-hbase-regionserver-1.nokiainfra-altiplano-hbase-regionserver.default.svc.cluster.local,16020,1637238611975
2021-11-18 13:06:55,396 INFO [PEWorker-16] assignment.AssignProcedure:
Retry=11 of max=10; pid=320, ppid=319, state=RUNNABLE:REGION_TRANSITION_QUEUE;
AssignProcedure table=hbase:meta, region=1588230740; rit=OFFLINE, location=null
2021-11-18 13:06:55,944 ERROR [PEWorker-16] procedure2.ProcedureExecutor:
CODE-BUG: Uncaught runtime exception for pid=319,
state=FAILED:RECOVER_META_ASSIGN_REGIONS,
exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via
AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max
attempts exceeded; RecoverMetaProcedure failedMetaServer=null, splitWal=true
java.lang.UnsupportedOperationException: unhandled
state=RECOVER_META_ASSIGN_REGIONS at
org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:209)
at
org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:52)
at
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
at org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1372)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1328)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1197)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
2021-11-18 13:06:55,958 ERROR [PEWorker-16] procedure2.ProcedureExecutor:
CODE-BUG: Uncaught runtime exception for pid=319,
state=FAILED:RECOVER_META_ASSIGN_REGIONS,
exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via
AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max
attempts exceeded; RecoverMetaProcedure failedMetaServer=null, splitWal=true
java.lang.UnsupportedOperationException: unhandled
state=RECOVER_META_ASSIGN_REGIONS at
org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:209)
at
org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:52)
at
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
at org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1372)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1328)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1197)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
2021-11-18 13:06:55,969 ERROR [PEWorker-16] procedure2.ProcedureExecutor:
CODE-BUG: Uncaught runtime exception for pid=319,
state=FAILED:RECOVER_META_ASSIGN_REGIONS,
exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via
AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max
attempts exceeded; RecoverMetaProcedure failedMetaServer=null, splitWal=true
java.lang.UnsupportedOperationException: unhandled
state=RECOVER_META_ASSIGN_REGIONS at
org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:209)
at
org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:52)
at
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
at org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1372)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1328)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1197)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
2021-11-18 13:06:55,970 WARN [PEWorker-16] procedure2.ProcedureExecutor:
Worker terminating UNNATURALLY null java.lang.ArrayIndexOutOfBoundsException: 2
at
org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405)
at
org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178)
at
org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513)
at
org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505)
at
org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741)
at
org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691)
at
org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1406)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1328)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1197)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
2021-11-18 13:07:46,268 INFO
[ReadOnlyZKClient-altiplano-zookeeper:2181@0x7e131580] zookeeper.ZooKeeper:
Session: 0x200000efa5dfae6 closed
============================================================
Error Message:
============================================================
==> /opt/hbase-2.0.1/logs/hbase--master-nokiainfra-altiplano-hbase-master-0.log
<==
2021-12-02 12:43:51,351 INFO
[RpcServer.default.FPBQ.Fifo.handler=129,queue=12,port=16000]
master.ServerManager: Registering
regionserver=nokiainfra-altiplano-hbase-regionserver-1.nokiainfra-altiplano-hbase-regionserver.default.svc.cluster.local,16020,1638449029563
2021-12-02 12:43:54,699 ERROR
[RpcServer.default.FPBQ.Fifo.handler=129,queue=12,port=16000]
master.MasterRpcServices: lock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:372)
at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper$6.addBlock(FanOutOneBlockAsyncDFSOutputHelper.java:380)
at
org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.createOutput(FanOutOneBlockAsyncDFSOutputHelper.java:774)
... 24 more
2021-12-02 12:43:54,746 INFO [main-EventThread] master.RegionServerTracker:
RegionServer ephemeral node deleted, processing expiration
[nokiainfra-altiplano-hbase-regionserver-1.nokiainfra-altiplano-hbase-regionserver.default.svc.cluster.local,16020,1638449029563]
2021-12-02 12:43:54,746 INFO [main-EventThread] master.ServerManager:
Processing expiration of
nokiainfra-altiplano-hbase-regionserver-1.nokiainfra-altiplano-hbase-regionserver.default.svc.cluster.local,16020,1638449029563
on
nokiainfra-altiplano-hbase-master-0.nokiainfra-altiplano-hbase-master.default.svc.cluster.local,16000,1638448730439
2021-12-02 12:43:54,860 INFO [PEWorker-10] procedure.ServerCrashProcedure:
Start pid=10, state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure
server=nokiainfra-altiplano-hbase-regionserver-1.nokiainfra-altiplano-hbase-regionserver.default.svc.cluster.local,16020,1638449029563,
splitWal=true, meta=false
--
This message was sent by Atlassian Jira
(v8.20.1#820001)