[
https://issues.apache.org/jira/browse/HBASE-20708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509129#comment-16509129
]
Duo Zhang commented on HBASE-20708:
-----------------------------------
And there is another problem for master restart is that, we need to find out
the RSes crashed after the old master quits and before the new master starts.
For now this is done at two places, first we will do it in
MasterMetaBootstrap.processDeadServers, and second in AssignmentManager after
we finish loading meta. And now we have two guards for SCP. After master
bootstrap we will enable server crash processing, but if the SCP is not for a
RS with meta region, then it will need to wait for the assignment to finish
loading the meta.
I think we could make the logic a little simpler here. When master starts, we
load all the procedures first but do not start procedure workers, initialize
RegionServerTracker to get the current online server lists, and scan the wal
directory to get RSes which have been alive for sometime, and finally we can
use these informations to find out the crashed RSes. And we can use the loaded
procedures to filter out the RSes which have not been processed, i.e, do not
have a SCP yet.
And I think we can remove the enable server crash processing guard, if a SCP is
for a RS with meta, then we could let it go until it reaches the
SERVER_CRASH_GET_REGIONS state.
And when reading the code, I found something strange, when updating meta
location, we always mark it as OPEN.
{code}
private void updateMetaLocation(final RegionInfo regionInfo, final ServerName
serverName)
throws IOException {
try {
MetaTableLocator.setMetaLocation(master.getZooKeeper(), serverName,
regionInfo.getReplicaId(), State.OPEN);
} catch (KeeperException e) {
throw new IOException(e);
}
}
{code}
Any reason why we do this? [~stack] Thanks.
> Remove the usage of RecoverMetaProcedure in master startup
> ----------------------------------------------------------
>
> Key: HBASE-20708
> URL: https://issues.apache.org/jira/browse/HBASE-20708
> Project: HBase
> Issue Type: Bug
> Components: proc-v2, Region Assignment
> Reporter: Duo Zhang
> Priority: Blocker
> Fix For: 3.0.0, 2.1.0
>
>
> In HBASE-20700, we make RecoverMetaProcedure use a special lock which is only
> used by RMP to avoid dead lock with MoveRegionProcedure. But we will always
> schedule a RMP when master starting up, so we still need to make sure that
> there is no race between this RMP and other RMPs and SCPs scheduled before
> the master restarts.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)