[
https://issues.apache.org/jira/browse/HBASE-26754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
kaushik mandal updated HBASE-26754:
-----------------------------------
Component/s: master
> hbase master crash after running couple of days with error STUCK
> Region-In-Transition rit=FAILED_OPEN, location=null, table=hbase:meta,
> region=xxxxxxxxxx
> ---------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-26754
> URL: https://issues.apache.org/jira/browse/HBASE-26754
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 2.4.8
> Reporter: kaushik mandal
> Priority: Major
>
> hbase master not responding after running couple of days and region server
> keep restarting.
> we are seeing bellow warning in master and region server
>
>
> WARN [ProcExecTimeout] assignment.AssignmentManager: STUCK
> Region-In-Transition rit=FAILED_OPEN, location=null, table=hbase:meta,
> region=xxxxxxxxxxxxx
> [master/xxxx-infra-xxxxx-hbase-master-0:16000.Chore.3] master.HMaster: Not
> running balancer because processing dead regionserver(s): 2022-02-07
> 19:54:11,512 INFO [ReadOnlyZKClient-xxxxxx-zookeeper:2181@0x2fcc92d9]
> zookeeper.ZooKeeper: Initiating client connection,
> connectString=xxxx-zookeeper:2181 sessionTimeout=90000
> watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$158/0x000000010057b440@48d2e00b
>
>
> WARN [ProcExecTimeout] assignment.AssignmentManager: STUCK
> Region-In-Transition rit=FAILED_OPEN, location=null, table=hbase:meta,
> region=1588230740 2022-02-07 19:54:15,643 INFO
> [hconnection-0x31420403-shared-pool7-t9731] client.RpcRetryingCallerImpl:
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3223)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1414)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:2947)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3272)
> at
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42002)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) at
> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) ,
> details=row '' on table 'hbase:meta' at region=hbase:meta,,1.1588230740,
> hostname=xxx-infra-xxxx-hbase-regionserver-0.xxx-infra-xxxx-hbase-regionserver.default.svc.cluster.local,16020,1644089730940,
> seqNum=-1
>
> from region server logs
> 2022-02-05 19:39:16,722 WARN
> [RpcServer.default.FPBQ.Fifo.handler=109,queue=5,port=16020]
> regionserver.RSRpcServices: Client tried to access missing scanner 0
> 2022-02-05 19:39:16,722 WARN
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=12,port=16020]
> regionserver.RSRpcServices: Client tried to access missing scanner 0
> 2022-02-05 19:39:16,721 WARN
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=11,port=16020]
> regionserver.RSRpcServices: Client tried to access missing scanner 0
> 2022-02-05 19:39:16,721 WARN
> [RpcServer.default.FPBQ.Fifo.handler=112,queue=8,port=16020]
> regionserver.RSRpcServices: Client tried to access missing scanner 0
> 2022-02-05 19:39:16,721 WARN
> [RpcServer.default.FPBQ.Fifo.handler=40,queue=1,port=16020]
> regionserver.RSRpcServices: Client tried to access missing scanner 0 ==>
> /opt/hbase-2.0.1/logs/SecurityAuth.audit <== 2022-02-05 19:39:17,882 INFO
> SecurityLogger.org.apache.hadoop.hbase.Server: Auth successful for hdfs
> (auth:) 2022-02-05 19:39:17,882 INFO
> SecurityLogger.org.apache.hadoop.hbase.Server: Connection from 10.42.0.124
> port: 44876 with unknown version info 2022-02-05 19:40:18,307 INFO
> SecurityLogger.org.apache.hadoop.hbase.Server: Auth successful for hdfs
> (auth:) 2022-02-05 19:40:18,307 INFO
> SecurityLogger.org.apache.hadoop.hbase.Server: Connection from 10.42.0.124
> port: 51098 with unknown version info ==>
> /opt/hbase-2.0.1/logs/hbase--regionserver-xxxx-infra-xxxxx-hbase-regionserver-0.log
> <== 2022-02-05 19:40:32,848 INFO [LruBlockCacheStatsExecutor]
> hfile.LruBlockCache: totalSize=300.98 KB, freeSize=399.71 MB, max=400 MB,
> blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0,
> cachingHits=0, cachingHitsRatio=0,evictions=29, evicted=0, evictedPerRun=0.0
--
This message was sent by Atlassian Jira
(v8.20.10#820010)