kaushik mandal created HBASE-26754:
--------------------------------------
Summary: hbase master crash after running couple of days with
error STUCK Region-In-Transition rit=FAILED_OPEN, location=null,
table=hbase:meta, region=xxxxxxxxxx
Key: HBASE-26754
URL: https://issues.apache.org/jira/browse/HBASE-26754
Project: HBase
Issue Type: Bug
Affects Versions: 2.4.8
Reporter: kaushik mandal
hbase master not responding after running couple of days and region server keep
restarting.
we are seeing bellow warning in master and region server
WARN [ProcExecTimeout] assignment.AssignmentManager: STUCK Region-In-Transition
rit=FAILED_OPEN, location=null, table=hbase:meta, region=xxxxxxxxxxxxx
[master/xxxx-infra-xxxxx-hbase-master-0:16000.Chore.3] master.HMaster: Not
running balancer because processing dead regionserver(s): 2022-02-07
19:54:11,512 INFO [ReadOnlyZKClient-xxxxxx-zookeeper:2181@0x2fcc92d9]
zookeeper.ZooKeeper: Initiating client connection,
connectString=xxxx-zookeeper:2181 sessionTimeout=90000
watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$158/0x000000010057b440@48d2e00b
WARN [ProcExecTimeout] assignment.AssignmentManager: STUCK Region-In-Transition
rit=FAILED_OPEN, location=null, table=hbase:meta, region=1588230740 2022-02-07
19:54:15,643 INFO [hconnection-0x31420403-shared-pool7-t9731]
client.RpcRetryingCallerImpl:
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3223)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1414)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:2947)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3272)
at
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42002)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) at
org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) ,
details=row '' on table 'hbase:meta' at region=hbase:meta,,1.1588230740,
hostname=xxx-infra-xxxx-hbase-regionserver-0.xxx-infra-xxxx-hbase-regionserver.default.svc.cluster.local,16020,1644089730940,
seqNum=-1
from region server logs
2022-02-05 19:39:16,722 WARN
[RpcServer.default.FPBQ.Fifo.handler=109,queue=5,port=16020]
regionserver.RSRpcServices: Client tried to access missing scanner 0 2022-02-05
19:39:16,722 WARN [RpcServer.default.FPBQ.Fifo.handler=25,queue=12,port=16020]
regionserver.RSRpcServices: Client tried to access missing scanner 0 2022-02-05
19:39:16,721 WARN [RpcServer.default.FPBQ.Fifo.handler=24,queue=11,port=16020]
regionserver.RSRpcServices: Client tried to access missing scanner 0 2022-02-05
19:39:16,721 WARN [RpcServer.default.FPBQ.Fifo.handler=112,queue=8,port=16020]
regionserver.RSRpcServices: Client tried to access missing scanner 0 2022-02-05
19:39:16,721 WARN [RpcServer.default.FPBQ.Fifo.handler=40,queue=1,port=16020]
regionserver.RSRpcServices: Client tried to access missing scanner 0 ==>
/opt/hbase-2.0.1/logs/SecurityAuth.audit <== 2022-02-05 19:39:17,882 INFO
SecurityLogger.org.apache.hadoop.hbase.Server: Auth successful for hdfs (auth:)
2022-02-05 19:39:17,882 INFO SecurityLogger.org.apache.hadoop.hbase.Server:
Connection from 10.42.0.124 port: 44876 with unknown version info 2022-02-05
19:40:18,307 INFO SecurityLogger.org.apache.hadoop.hbase.Server: Auth
successful for hdfs (auth:) 2022-02-05 19:40:18,307 INFO
SecurityLogger.org.apache.hadoop.hbase.Server: Connection from 10.42.0.124
port: 51098 with unknown version info ==>
/opt/hbase-2.0.1/logs/hbase--regionserver-xxxx-infra-xxxxx-hbase-regionserver-0.log
<== 2022-02-05 19:40:32,848 INFO [LruBlockCacheStatsExecutor]
hfile.LruBlockCache: totalSize=300.98 KB, freeSize=399.71 MB, max=400 MB,
blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0,
cachingHitsRatio=0,evictions=29, evicted=0, evictedPerRun=0.0
--
This message was sent by Atlassian Jira
(v8.20.1#820001)