[ 
https://issues.apache.org/jira/browse/HBASE-20708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509129#comment-16509129
 ] 

Duo Zhang commented on HBASE-20708:
-----------------------------------

And there is another problem for master restart is that, we need to find out 
the RSes crashed after the old master quits and before the new master starts. 
For now this is done at two places, first we will do it in 
MasterMetaBootstrap.processDeadServers, and second in AssignmentManager after 
we finish loading meta. And now we have two guards for SCP. After master 
bootstrap we will enable server crash processing, but if the SCP is not for a 
RS with meta region, then it will need to wait for the assignment to finish 
loading the meta.

I think we could make the logic a little simpler here. When master starts, we 
load all the procedures first but do not start procedure workers, initialize 
RegionServerTracker to get the current online server lists, and scan the wal 
directory to get RSes which have been alive for sometime, and finally we can 
use these informations to find out the crashed RSes. And we can use the loaded 
procedures to filter out the RSes which have not been processed, i.e, do not 
have a SCP yet.

And I think we can remove the enable server crash processing guard, if a SCP is 
for a RS with meta, then we could let it go until it reaches the 
SERVER_CRASH_GET_REGIONS state.

And when reading the code, I found something strange, when updating meta 
location, we always mark it as OPEN.

{code}
  private void updateMetaLocation(final RegionInfo regionInfo, final ServerName 
serverName)
      throws IOException {
    try {
      MetaTableLocator.setMetaLocation(master.getZooKeeper(), serverName,
        regionInfo.getReplicaId(), State.OPEN);
    } catch (KeeperException e) {
      throw new IOException(e);
    }
  }
{code}

Any reason why we do this? [~stack] Thanks.



> Remove the usage of RecoverMetaProcedure in master startup
> ----------------------------------------------------------
>
>                 Key: HBASE-20708
>                 URL: https://issues.apache.org/jira/browse/HBASE-20708
>             Project: HBase
>          Issue Type: Bug
>          Components: proc-v2, Region Assignment
>            Reporter: Duo Zhang
>            Priority: Blocker
>             Fix For: 3.0.0, 2.1.0
>
>
> In HBASE-20700, we make RecoverMetaProcedure use a special lock which is only 
> used by RMP to avoid dead lock with MoveRegionProcedure. But we will always 
> schedule a RMP when master starting up, so we still need to make sure that 
> there is no race between this RMP and other RMPs and SCPs scheduled before 
> the master restarts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to