[
https://issues.apache.org/jira/browse/HBASE-26864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510172#comment-17510172
]
Huaxiang Sun commented on HBASE-26864:
--------------------------------------
Thanks for explain. I assumed that report is associated with procId, and master
would discard report when there is no outstanding procedure.
For this specific case, there is a bug in handling Rollback in
SplitTableRegionProcedure, preparing a patch.
[https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/SplitTableRegionProcedure.java#L304]
[https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/SplitTableRegionProcedure.java#L385]
{code:java}
In the state machine:
case SPLIT_TABLE_REGION_CLOSE_PARENT_REGION:
addChildProcedure(createUnassignProcedures(env));
// Comments from HX:
// createUnassignProcedures() can throw out IOException. If this
happens,
// it wont reach state SPLIT_TABLE_REGIONS_CHECK_CLOSED_REGION and no
parent regions
// is closed as all created UnassignProcedures are rolled back. If it
rolls back with
// state SPLIT_TABLE_REGION_CLOSE_PARENT_REGION, no need to call
openParentRegion(),
// otherwise, it will result in OpenRegionProcedure for an already
open region.
setNextState(SplitTableRegionState.SPLIT_TABLE_REGIONS_CHECK_CLOSED_REGIONS);
break;
In the rollback,
case SPLIT_TABLE_REGIONS_CHECK_CLOSED_REGIONS:
// Doing nothing, in SPLIT_TABLE_REGION_CLOSE_PARENT_REGION,
// we will bring parent region online
break;
case SPLIT_TABLE_REGION_CLOSE_PARENT_REGION:
// Comments from HX:
// OpenParentRegion() should not be called here as explained above.
openParentRegion(env);
break; {code}
> Region Server does not send Ack back to master after receiving an
> OpenRegionReq for already opened regions, causing OpenRegionProcedure stay
> forever.
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-26864
> URL: https://issues.apache.org/jira/browse/HBASE-26864
> Project: HBase
> Issue Type: Bug
> Components: Region Assignment
> Affects Versions: 2.4.10
> Reporter: Huaxiang Sun
> Assignee: Huaxiang Sun
> Priority: Major
>
> For some upgrading cases, we found that master issues RegionOpen for an
> already open region and Region Sever simply logs
> {code:java}
> 2022-03-17 22:16:55,595 WARN
> org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler: Received
> OPEN for
> foo,b2875fcb-7bc0-4fa9-a980-e902faf7f151,1631771037620.def199cc7208615b783b285f582ddfa4.
> which is already online {code}
> and it does not ack or nack master. This OpenRegionProceduce is stuck forever.
> In this specific case, it needs to ack master that region is open.
>
> For the cause of why it sent an OpenRegion request for an already open
> region, it will be followed by another issue.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)