[
https://issues.apache.org/jira/browse/HBASE-22365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833376#comment-16833376
]
Duo Zhang commented on HBASE-22365:
-----------------------------------
I think there are two ways to fix the problem.
1. Add a check when updating region state in RegionRemoteProcedureBase, under
the read lock of ServerStateNode. If the server is dead then we go to the CRASH
state. The advantage here is that it does not change the logic a lot, but the
problem is that, it may introduce dead lock when meta region is also on the
target(dead) region server, and if want to fix the dead lock, there will be
more works to do and make the code more complicated.
2. Change the memory state right after we persist the procedure state in
reportRegionStateTransition, and only update the state in meta later when the
procedure is woken up to run. The advantage here is that semantically it is
cleaner, as the region server will think the state transition is successful if
reportRegionStateTransition returns normally. But the problem is that there
will be more side effects as we changed the logic a lot, and we also need to
keep the same behavior when master restarts.
I prefer solution 2, but anyway, let me provide a UT first.
> Region may be opened in two RegionServers
> -----------------------------------------
>
> Key: HBASE-22365
> URL: https://issues.apache.org/jira/browse/HBASE-22365
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 3.0.0, 2.2.0, 2.3.0
> Reporter: Guanghao Zhang
> Assignee: Duo Zhang
> Priority: Blocker
>
> Found this problem when run ITBLL with our internal branch which is based on
> branch-2.2. So mark this as a blocker for 2.2.0. A region
> 7ebdca9cd09e26074749b546586e2156 is moved from RS-st99 to RS-st98 and the
> TRSP succeed. Meanwhile, RS-st99 crashed and schedule a new SCP for RS-st99.
> So SCP initialized subprocedures forĀ 7ebdca9cd09e26074749b546586e2156, too.
> Then theĀ 7ebdca9cd09e26074749b546586e2156 was assigned to two RegionServers.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)