Michael Stack created HBASE-23369:
-------------------------------------
Summary: Auto-close 'unknown' Regions reported as OPEN on
RegionServers
Key: HBASE-23369
URL: https://issues.apache.org/jira/browse/HBASE-23369
Project: HBase
Issue Type: Bug
Reporter: Michael Stack
In old days, if a RegionServer reported a variance that didn't agree w/ Master
view of the cluster, we'd kill the RegionServer.
Lately, in tests that overrun a cluster, after a sustained high-load, Master
can start failing its updates against Meta (CallQueueTooBigException <= More on
this later). It then can lose proper accounting of all Region members. One
variant has a RegionServer reporting its list of open Regions to the Master and
the Master doesn't 'know' of a particular Region or the Master may know the
Region but expects it open on another RegionServer.
Here is an example of how it looks each time RS reports:
{code}
2019-12-03 07:07:00,757 WARN
org.apache.hadoop.hbase.master.assignment.AssignmentManager: No
t1,08f5c285,1573094375485.ee78a0c951c1c902d8f3f3912394a0e5. RegionStateNode but
reported ONLINE at server.example.org,16020,1575354666245
(inServerRegionList=false).
2019-12-03 07:07:03,793 WARN
org.apache.hadoop.hbase.master.assignment.AssignmentManager: No
t1,08f5c285,1573094375485.ee78a0c951c1c902d8f3f3912394a0e5. RegionStateNode but
reported ONLINE at server.example.org,16020,1575354666245
(inServerRegionList=false).
{code}
Will also show as an 'inconsistency' in the 'HBCK' tab on the Master UI.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)