Nick Dimiduk created HBASE-24293:
------------------------------------
Summary: Assignment manager should never give up assigning meta
Key: HBASE-24293
URL: https://issues.apache.org/jira/browse/HBASE-24293
Project: HBase
Issue Type: Bug
Components: master, Region Assignment
Affects Versions: 2.3.0
Reporter: Nick Dimiduk
Not yet sure how we got here, but,
{noformat}
2020-04-29 22:39:16,140 INFO
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: pid=308,
state=RUNNABLE:SERVER_CRASH_ASSIGN_META, locked=true; ServerCrashProcedure
server= host-a.example.com,16020,1588033841562, splitWal=true, meta=true found
a region state=OFFLINE, location=null, table=hbase:meta, region=1588230740
which is no longer on us host-a.example.com,16020,1588033841562, give up
assigning...
{noformat}
Assignment manager gives up on this procedure and nothing can progress. Manual
intervention is necessary.
>From this [conditional
>block|https://github.com/apache/hbase/blob/1415a82d41a1e125440014a4b23364371b30d065/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L475],
> it seems the {{regionNode}} location is {{null}}.
{noformat}
// This is possible, as when a server is dead, TRSP will fail to
schedule a RemoteProcedure
// to us and then try to assign the region to a new RS. And before it
has updated the region
// location to the new RS, we may have already called the
am.getRegionsOnServer so we will
// consider the region is still on us. And then before we arrive here,
the TRSP could have
// updated the region location, or even finished itself, so the region
is no longer on us
// any more, we should not try to assign it again. Please see
HBASE-23594 for more details.
if (!serverName.equals(regionNode.getRegionLocation())) {
LOG.info("{} found a region {} which is no longer on us {}, give up
assigning...", this,
regionNode, serverName);
continue;
}
{noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)