kaushik mandal created HBASE-26568:
--------------------------------------

             Summary: hbase master got stuck after running couple of days in 
Azure setup
                 Key: HBASE-26568
                 URL: https://issues.apache.org/jira/browse/HBASE-26568
             Project: HBase
          Issue Type: Bug
          Components: hbase-thirdparty
         Environment: Azure cloud
            Reporter: kaushik mandal
         Attachments: hbase-master-log-0.txt, hbase-master-log-1.txt

hadoop hbase version 2.0.1
hadoop hdfs version 2.7.7

 

In Azure cluster setup, hbase master got hangs or not responding after running 
couple of days

and the only way to recover hbase master is delete /hbase and restart. Bellow 
is the error getting in the hbase-master

 

Error message

==============

2021-11-18 13:06:55,396 INFO 
[RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16000] 
assignment.AssignProcedure: Retry=10 of max=10; pid=320, ppid=319, 
state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure table=hbase:meta, 
region=1588230740; rit=OPENING, 
location=nokiainfra-altiplano-hbase-regionserver-1.nokiainfra-altiplano-hbase-regionserver.default.svc.cluster.local,16020,1637238611975
 2021-11-18 13:06:55,396 INFO [PEWorker-16] assignment.AssignProcedure: 
Retry=11 of max=10; pid=320, ppid=319, state=RUNNABLE:REGION_TRANSITION_QUEUE; 
AssignProcedure table=hbase:meta, region=1588230740; rit=OFFLINE, location=null 
2021-11-18 13:06:55,944 ERROR [PEWorker-16] procedure2.ProcedureExecutor: 
CODE-BUG: Uncaught runtime exception for pid=319, 
state=FAILED:RECOVER_META_ASSIGN_REGIONS, 
exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
attempts exceeded; RecoverMetaProcedure failedMetaServer=null, splitWal=true 
java.lang.UnsupportedOperationException: unhandled 
state=RECOVER_META_ASSIGN_REGIONS at 
org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:209)
 at 
org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:52)
 at 
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
 at org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864) 
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1372)
 at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1328)
 at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1197)
 at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
 at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
 2021-11-18 13:06:55,958 ERROR [PEWorker-16] procedure2.ProcedureExecutor: 
CODE-BUG: Uncaught runtime exception for pid=319, 
state=FAILED:RECOVER_META_ASSIGN_REGIONS, 
exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
attempts exceeded; RecoverMetaProcedure failedMetaServer=null, splitWal=true 
java.lang.UnsupportedOperationException: unhandled 
state=RECOVER_META_ASSIGN_REGIONS at 
org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:209)
 at 
org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:52)
 at 
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
 at org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864) 
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1372)
 at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1328)
 at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1197)
 at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
 at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
 2021-11-18 13:06:55,969 ERROR [PEWorker-16] procedure2.ProcedureExecutor: 
CODE-BUG: Uncaught runtime exception for pid=319, 
state=FAILED:RECOVER_META_ASSIGN_REGIONS, 
exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
attempts exceeded; RecoverMetaProcedure failedMetaServer=null, splitWal=true 
java.lang.UnsupportedOperationException: unhandled 
state=RECOVER_META_ASSIGN_REGIONS at 
org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:209)
 at 
org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:52)
 at 
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
 at org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864) 
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1372)
 at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1328)
 at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1197)
 at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
 at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
 2021-11-18 13:06:55,970 WARN [PEWorker-16] procedure2.ProcedureExecutor: 
Worker terminating UNNATURALLY null java.lang.ArrayIndexOutOfBoundsException: 2 
at 
org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405)
 at 
org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178)
 at 
org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513)
 at 
org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505)
 at 
org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741)
 at 
org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691)
 at 
org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603)
 at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1406)
 at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1328)
 at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1197)
 at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
 at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
 2021-11-18 13:07:46,268 INFO 
[ReadOnlyZKClient-altiplano-zookeeper:2181@0x7e131580] zookeeper.ZooKeeper: 
Session: 0x200000efa5dfae6 closed

============================================================

 

Error Message:

============================================================

==> /opt/hbase-2.0.1/logs/hbase--master-nokiainfra-altiplano-hbase-master-0.log 
<==
2021-12-02 12:43:51,351 INFO  
[RpcServer.default.FPBQ.Fifo.handler=129,queue=12,port=16000] 
master.ServerManager: Registering 
regionserver=nokiainfra-altiplano-hbase-regionserver-1.nokiainfra-altiplano-hbase-regionserver.default.svc.cluster.local,16020,1638449029563
2021-12-02 12:43:54,699 ERROR 
[RpcServer.default.FPBQ.Fifo.handler=129,queue=12,port=16000] 
master.MasterRpcServices: lock(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:372)
    at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at 
org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper$6.addBlock(FanOutOneBlockAsyncDFSOutputHelper.java:380)
    at 
org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.createOutput(FanOutOneBlockAsyncDFSOutputHelper.java:774)
    ... 24 more

2021-12-02 12:43:54,746 INFO  [main-EventThread] master.RegionServerTracker: 
RegionServer ephemeral node deleted, processing expiration 
[nokiainfra-altiplano-hbase-regionserver-1.nokiainfra-altiplano-hbase-regionserver.default.svc.cluster.local,16020,1638449029563]
2021-12-02 12:43:54,746 INFO  [main-EventThread] master.ServerManager: 
Processing expiration of 
nokiainfra-altiplano-hbase-regionserver-1.nokiainfra-altiplano-hbase-regionserver.default.svc.cluster.local,16020,1638449029563
 on 
nokiainfra-altiplano-hbase-master-0.nokiainfra-altiplano-hbase-master.default.svc.cluster.local,16000,1638448730439
2021-12-02 12:43:54,860 INFO  [PEWorker-10] procedure.ServerCrashProcedure: 
Start pid=10, state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure 
server=nokiainfra-altiplano-hbase-regionserver-1.nokiainfra-altiplano-hbase-regionserver.default.svc.cluster.local,16020,1638449029563,
 splitWal=true, meta=false

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to