kaushik mandal created HBASE-26754:
--------------------------------------

             Summary: hbase master crash after running couple of days with 
error STUCK Region-In-Transition rit=FAILED_OPEN, location=null, 
table=hbase:meta, region=xxxxxxxxxx
                 Key: HBASE-26754
                 URL: https://issues.apache.org/jira/browse/HBASE-26754
             Project: HBase
          Issue Type: Bug
    Affects Versions: 2.4.8
            Reporter: kaushik mandal


hbase master not responding after running couple of days and region server keep 
restarting.

we are seeing bellow warning in master and region server

 

 

WARN [ProcExecTimeout] assignment.AssignmentManager: STUCK Region-In-Transition 
rit=FAILED_OPEN, location=null, table=hbase:meta, region=xxxxxxxxxxxxx

[master/xxxx-infra-xxxxx-hbase-master-0:16000.Chore.3] master.HMaster: Not 
running balancer because processing dead regionserver(s): 2022-02-07 
19:54:11,512 INFO [ReadOnlyZKClient-xxxxxx-zookeeper:2181@0x2fcc92d9] 
zookeeper.ZooKeeper: Initiating client connection, 
connectString=xxxx-zookeeper:2181 sessionTimeout=90000 
watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$158/0x000000010057b440@48d2e00b

 

 

WARN [ProcExecTimeout] assignment.AssignmentManager: STUCK Region-In-Transition 
rit=FAILED_OPEN, location=null, table=hbase:meta, region=1588230740 2022-02-07 
19:54:15,643 INFO [hconnection-0x31420403-shared-pool7-t9731] 
client.RpcRetryingCallerImpl: 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3223)
 at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1414)
 at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:2947)
 at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3272)
 at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42002)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) at 
org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) , 
details=row '' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, 
hostname=xxx-infra-xxxx-hbase-regionserver-0.xxx-infra-xxxx-hbase-regionserver.default.svc.cluster.local,16020,1644089730940,
 seqNum=-1

 


from region server logs

2022-02-05 19:39:16,722 WARN 
[RpcServer.default.FPBQ.Fifo.handler=109,queue=5,port=16020] 
regionserver.RSRpcServices: Client tried to access missing scanner 0 2022-02-05 
19:39:16,722 WARN [RpcServer.default.FPBQ.Fifo.handler=25,queue=12,port=16020] 
regionserver.RSRpcServices: Client tried to access missing scanner 0 2022-02-05 
19:39:16,721 WARN [RpcServer.default.FPBQ.Fifo.handler=24,queue=11,port=16020] 
regionserver.RSRpcServices: Client tried to access missing scanner 0 2022-02-05 
19:39:16,721 WARN [RpcServer.default.FPBQ.Fifo.handler=112,queue=8,port=16020] 
regionserver.RSRpcServices: Client tried to access missing scanner 0 2022-02-05 
19:39:16,721 WARN [RpcServer.default.FPBQ.Fifo.handler=40,queue=1,port=16020] 
regionserver.RSRpcServices: Client tried to access missing scanner 0 ==> 
/opt/hbase-2.0.1/logs/SecurityAuth.audit <== 2022-02-05 19:39:17,882 INFO 
SecurityLogger.org.apache.hadoop.hbase.Server: Auth successful for hdfs (auth:) 
2022-02-05 19:39:17,882 INFO SecurityLogger.org.apache.hadoop.hbase.Server: 
Connection from 10.42.0.124 port: 44876 with unknown version info 2022-02-05 
19:40:18,307 INFO SecurityLogger.org.apache.hadoop.hbase.Server: Auth 
successful for hdfs (auth:) 2022-02-05 19:40:18,307 INFO 
SecurityLogger.org.apache.hadoop.hbase.Server: Connection from 10.42.0.124 
port: 51098 with unknown version info ==> 
/opt/hbase-2.0.1/logs/hbase--regionserver-xxxx-infra-xxxxx-hbase-regionserver-0.log
 <== 2022-02-05 19:40:32,848 INFO [LruBlockCacheStatsExecutor] 
hfile.LruBlockCache: totalSize=300.98 KB, freeSize=399.71 MB, max=400 MB, 
blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, 
cachingHitsRatio=0,evictions=29, evicted=0, evictedPerRun=0.0



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to