cuibo01 commented on a change in pull request #2084:
URL: https://github.com/apache/hbase/pull/2084#discussion_r456984183



##########
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
##########
@@ -846,10 +849,37 @@ private void 
finishActiveMasterInitialization(MonitoredTask status)
     if (isStopped()) return;
 
     status.setStatus("Submitting log splitting work for previously failed 
region servers");
+
+    // grab the list of procedures once. SCP fom pre-crash should all be 
loaded, and can't progress
+    // until AM joins the cluster any SCPs that got added after we get the log 
folder list should be
+    // for a different start code.
+    final Set<ServerName> alreadyHasSCP = new HashSet<>();
+    long scpCount = 0;
+    for (ProcedureInfo procInfo : this.procedureExecutor.listProcedures() ) {
+      final Procedure proc = 
this.procedureExecutor.getProcedure(procInfo.getProcId());
+      if (proc != null) {
+        if (proc instanceof ServerCrashProcedure && !(proc.isFinished() || 
proc.isSuccess())) {
+          scpCount++;
+          alreadyHasSCP.add(((ServerCrashProcedure)proc).getServerName());
+        }
+      }
+    }
+    LOG.info("Restored proceduces include " + scpCount + " SCP covering " + 
alreadyHasSCP.size() +
+        " ServerName.");
+    
+ 
+    LOG.info("Checking " + previouslyFailedServers.size() + " previously 
failed servers (seen via wals) for existing SCP.");
+    // AM should be in "not yet init" and these should all be queued
     // Master has recovered hbase:meta region server and we put
     // other failed region servers in a queue to be handled later by SSH
     for (ServerName tmpServer : previouslyFailedServers) {
-      this.serverManager.processDeadServer(tmpServer, true);
+      if (alreadyHasSCP.contains(tmpServer)) {
+        LOG.info("Skipping failed server in FS because it already has a queued 
SCP: " + tmpServer);
+        this.serverManager.getDeadServers().add(tmpServer);

Review comment:
       > does a queued SCP imply that the server should already be in the dead 
servers list? Or do we only add servers to that when we create an SCP and not 
when we recover them?
   
   We need to tell the master which scp has been included in the procStore and 
avoid scp being recreated




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to