[ 
https://issues.apache.org/jira/browse/HBASE-24293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17096856#comment-17096856
 ] 

Nick Dimiduk commented on HBASE-24293:
--------------------------------------

Looks like this condition was patching a different hole, via HBASE-23594.

> Assignment manager should never give up assigning meta
> ------------------------------------------------------
>
>                 Key: HBASE-24293
>                 URL: https://issues.apache.org/jira/browse/HBASE-24293
>             Project: HBase
>          Issue Type: Bug
>          Components: master, Region Assignment
>    Affects Versions: 2.3.0
>            Reporter: Nick Dimiduk
>            Priority: Critical
>
> Not yet sure how we got here, but,
> {noformat}
> 2020-04-29 22:39:16,140 INFO 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: pid=308, 
> state=RUNNABLE:SERVER_CRASH_ASSIGN_META, locked=true; ServerCrashProcedure 
> server= host-a.example.com,16020,1588033841562, splitWal=true, meta=true 
> found a region state=OFFLINE, location=null, table=hbase:meta, 
> region=1588230740 which is no longer on us 
> host-a.example.com,16020,1588033841562, give up assigning...
> {noformat}
> Assignment manager gives up on this procedure and nothing can progress. 
> Manual intervention is necessary.
> From this [conditional 
> block|https://github.com/apache/hbase/blob/1415a82d41a1e125440014a4b23364371b30d065/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L475],
>  it seems the {{regionNode}} location is {{null}}.
> {noformat}
>         // This is possible, as when a server is dead, TRSP will fail to 
> schedule a RemoteProcedure
>         // to us and then try to assign the region to a new RS. And before it 
> has updated the region
>         // location to the new RS, we may have already called the 
> am.getRegionsOnServer so we will
>         // consider the region is still on us. And then before we arrive 
> here, the TRSP could have
>         // updated the region location, or even finished itself, so the 
> region is no longer on us
>         // any more, we should not try to assign it again. Please see 
> HBASE-23594 for more details.
>         if (!serverName.equals(regionNode.getRegionLocation())) {
>           LOG.info("{} found a region {} which is no longer on us {}, give up 
> assigning...", this,
>             regionNode, serverName);
>           continue;
>         }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to