[
https://issues.apache.org/jira/browse/HBASE-26914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
LiangJun He updated HBASE-26914:
--------------------------------
Description:
After HBASE-26245, we can rebuild a cluster from an existing root directory on
cloud environment, but we still have a problem, if we stop the regionservers
of the old cluster first, and then stop the masters, new cluster(rebuild from
old cluster's root directory) may hang.
This problem is also described in HBASE-26898.
Hang Reason:
For example, if RS1 is stopped first, the Master will expire RS1, and will
generate a ServerCrashProcedure to reassign the region on RS1 (at the same
time, the Master will delete the RS1 node information saved in the local
region), and then generate subProcedure TransitRegionStateProcedure for
reassignment, that is planned to assign the region to RS2 (RS2 has not been
stopped at this time), but then RS2 is stopped, and the Master is also stopped.
Rebuild a cluster from an existing root directory(from old cluster on OSS), the
Master will recover from the Procedure state and continue to execute the
unfinished Procedure. At this time, there is an uncompleted
TransitRegionStateProcedure, and the target-RS to be assigned to the region is
the RS2 of the old cluster, it will hang forever when executed.
was:
After HBASE-26245, we can rebuild a cluster from an existing root directory on
cloud environment, but we still have a problem, if we stop the regionservers
of the old cluster first, and then stop the masters, new cluster(rebuild from
old cluster's root directory) may hang.
This problem is also described in HBASE-26898.
Hang reason:
> Rebuild a cluster from an existing root directory may hang
> ----------------------------------------------------------
>
> Key: HBASE-26914
> URL: https://issues.apache.org/jira/browse/HBASE-26914
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 3.0.0-alpha-2
> Reporter: LiangJun He
> Assignee: LiangJun He
> Priority: Major
> Fix For: 3.0.0-alpha-2
>
>
> After HBASE-26245, we can rebuild a cluster from an existing root directory
> on cloud environment, but we still have a problem, if we stop the
> regionservers of the old cluster first, and then stop the masters, new
> cluster(rebuild from old cluster's root directory) may hang.
> This problem is also described in HBASE-26898.
>
> Hang Reason:
> For example, if RS1 is stopped first, the Master will expire RS1, and will
> generate a ServerCrashProcedure to reassign the region on RS1 (at the same
> time, the Master will delete the RS1 node information saved in the local
> region), and then generate subProcedure TransitRegionStateProcedure for
> reassignment, that is planned to assign the region to RS2 (RS2 has not been
> stopped at this time), but then RS2 is stopped, and the Master is also
> stopped.
> Rebuild a cluster from an existing root directory(from old cluster on OSS),
> the Master will recover from the Procedure state and continue to execute the
> unfinished Procedure. At this time, there is an uncompleted
> TransitRegionStateProcedure, and the target-RS to be assigned to the region
> is the RS2 of the old cluster, it will hang forever when executed.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)