[ 
https://issues.apache.org/jira/browse/HBASE-26864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510172#comment-17510172
 ] 

Huaxiang Sun commented on HBASE-26864:
--------------------------------------

Thanks for explain. I assumed that report is associated with procId, and master 
would discard report when there is no outstanding procedure.

For this specific case, there is a bug in handling Rollback in 
SplitTableRegionProcedure, preparing a patch.

[https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/SplitTableRegionProcedure.java#L304]

[https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/SplitTableRegionProcedure.java#L385]
{code:java}
In the state machine:


        case SPLIT_TABLE_REGION_CLOSE_PARENT_REGION:
          addChildProcedure(createUnassignProcedures(env));
          // Comments from HX:
          // createUnassignProcedures() can throw out IOException. If this 
happens,
          // it wont reach state SPLIT_TABLE_REGIONS_CHECK_CLOSED_REGION and no 
parent regions
          // is closed as all created UnassignProcedures are rolled back. If it 
rolls back with
          // state SPLIT_TABLE_REGION_CLOSE_PARENT_REGION, no need to call 
openParentRegion(),
          // otherwise, it will result in OpenRegionProcedure for an already 
open region.
          
setNextState(SplitTableRegionState.SPLIT_TABLE_REGIONS_CHECK_CLOSED_REGIONS);
          break;


In the rollback,


        case SPLIT_TABLE_REGIONS_CHECK_CLOSED_REGIONS:
          // Doing nothing, in SPLIT_TABLE_REGION_CLOSE_PARENT_REGION,
          // we will bring parent region online
          break;
        case SPLIT_TABLE_REGION_CLOSE_PARENT_REGION:
          // Comments from HX: 
          // OpenParentRegion() should not be called here as explained above.
          openParentRegion(env);
          break; {code}

> Region Server does not send Ack back to master after receiving an 
> OpenRegionReq for already opened regions, causing OpenRegionProcedure stay 
> forever.
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-26864
>                 URL: https://issues.apache.org/jira/browse/HBASE-26864
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>    Affects Versions: 2.4.10
>            Reporter: Huaxiang Sun
>            Assignee: Huaxiang Sun
>            Priority: Major
>
> For some upgrading cases, we found that master issues RegionOpen for an 
> already open region and Region Sever simply logs 
> {code:java}
> 2022-03-17 22:16:55,595 WARN 
> org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler: Received 
> OPEN for 
> foo,b2875fcb-7bc0-4fa9-a980-e902faf7f151,1631771037620.def199cc7208615b783b285f582ddfa4.
>  which is already online {code}
> and it does not ack or nack master. This OpenRegionProceduce is stuck forever.
> In this specific case, it needs to ack master that region is open. 
>  
> For the cause of why it sent an OpenRegion request for an already open 
> region, it will be followed by another issue.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to