[
https://issues.apache.org/jira/browse/HBASE-24293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17096856#comment-17096856
]
Nick Dimiduk commented on HBASE-24293:
--------------------------------------
Looks like this condition was patching a different hole, via HBASE-23594.
> Assignment manager should never give up assigning meta
> ------------------------------------------------------
>
> Key: HBASE-24293
> URL: https://issues.apache.org/jira/browse/HBASE-24293
> Project: HBase
> Issue Type: Bug
> Components: master, Region Assignment
> Affects Versions: 2.3.0
> Reporter: Nick Dimiduk
> Priority: Critical
>
> Not yet sure how we got here, but,
> {noformat}
> 2020-04-29 22:39:16,140 INFO
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: pid=308,
> state=RUNNABLE:SERVER_CRASH_ASSIGN_META, locked=true; ServerCrashProcedure
> server= host-a.example.com,16020,1588033841562, splitWal=true, meta=true
> found a region state=OFFLINE, location=null, table=hbase:meta,
> region=1588230740 which is no longer on us
> host-a.example.com,16020,1588033841562, give up assigning...
> {noformat}
> Assignment manager gives up on this procedure and nothing can progress.
> Manual intervention is necessary.
> From this [conditional
> block|https://github.com/apache/hbase/blob/1415a82d41a1e125440014a4b23364371b30d065/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L475],
> it seems the {{regionNode}} location is {{null}}.
> {noformat}
> // This is possible, as when a server is dead, TRSP will fail to
> schedule a RemoteProcedure
> // to us and then try to assign the region to a new RS. And before it
> has updated the region
> // location to the new RS, we may have already called the
> am.getRegionsOnServer so we will
> // consider the region is still on us. And then before we arrive
> here, the TRSP could have
> // updated the region location, or even finished itself, so the
> region is no longer on us
> // any more, we should not try to assign it again. Please see
> HBASE-23594 for more details.
> if (!serverName.equals(regionNode.getRegionLocation())) {
> LOG.info("{} found a region {} which is no longer on us {}, give up
> assigning...", this,
> regionNode, serverName);
> continue;
> }
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)