[ 
https://issues.apache.org/jira/browse/HBASE-23808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067262#comment-17067262
 ] 

Michael Stack commented on HBASE-23808:
---------------------------------------

The nature of this was changed by me here:

commit 50161f2de4a4efb48e250b07dd609175f82a7331
Author: stack <st...@apache.org>
Date:   Mon Mar 23 11:45:46 2020 -0700

    HBASE-24034 [Flakey Tests] A couple of fixes and cleanups



diff --git 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
index 2f4bea8799..08043ef4e2 100644
--- a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
+++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
@@ -1541,7 +1541,9 @@ public class HMaster extends HRegionServer implements 
MasterServices {
     // startProcedureExecutor. See the javadoc for 
finishActiveMasterInitialization for more
     // details.
     procedureExecutor.init(numThreads, abortOnCorruption);
-    procEnv.getRemoteDispatcher().start();
+    if (!procEnv.getRemoteDispatcher().start()) {
+      throw new HBaseIOException("Failed start of remote dispatcher");
+    }
   }

diff --git 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RSProcedureDispatcher.java
 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RSProcedureDispatcher.java
index b469cb86e6..10b823ca67 100644
--- 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RSProcedureDispatcher.java
+++ 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RSProcedureDispatcher.java
@@ -29,6 +29,7 @@ import org.apache.hadoop.hbase.client.RegionInfo;
 import org.apache.hadoop.hbase.ipc.ServerNotRunningYetException;
 import org.apache.hadoop.hbase.master.MasterServices;
 import org.apache.hadoop.hbase.master.ServerListener;
+import org.apache.hadoop.hbase.master.ServerManager;
 import org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher;
 import org.apache.hadoop.hbase.regionserver.RegionServerAbortedException;
 import org.apache.hadoop.hbase.regionserver.RegionServerStoppedException;
@@ -93,11 +94,16 @@ public class RSProcedureDispatcher
     if (!super.start()) {
       return false;
     }
-
-    master.getServerManager().registerListener(this);
-    procedureEnv = master.getMasterProcedureExecutor().getEnvironment();
-    for (ServerName serverName: 
master.getServerManager().getOnlineServersList()) {
-      addNode(serverName);
+    // Around startup, if failed, some of the below may be set back to null so 
NPE is possible.
+    try {
+      master.getServerManager().registerListener(this);
+      procedureEnv = master.getMasterProcedureExecutor().getEnvironment();
+      for (ServerName serverName : 
master.getServerManager().getOnlineServersList()) {
+        addNode(serverName);
+      }
+    } catch (Exception e) {
+      LOG.info("Failed start", e);
+      return false;
     }
     return true;
   }


So, now we get an HBASEIOE out of the run loop which causes us to exit... but 
in this case we've set 'stop' on master so we don't 'abort' the Master... we'll 
try to run a nice shutdown. We are hungup though where , I'm not sure as yet.

Meantime the Admin shutdown command is hung... hasn't completed as though we 
are not sending out an answer.

I can't make this fail locally. Been at it a while now.

Will keep at it. Probably good to push more debug.

Thanks for taking a look [~bharathv]. Always good to get second opinion.






> [Flakey Test] 
> TestMasterShutdown#testMasterShutdownBeforeStartingAnyRegionServer
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-23808
>                 URL: https://issues.apache.org/jira/browse/HBASE-23808
>             Project: HBase
>          Issue Type: Test
>          Components: test
>    Affects Versions: 2.3.0
>            Reporter: Nick Dimiduk
>            Assignee: Nick Dimiduk
>            Priority: Major
>             Fix For: 3.0.0, 2.3.0, 2.2.4
>
>         Attachments: 
> TEST-org.apache.hadoop.hbase.master.TestMasterShutdown.xml
>
>
> Reproduces locally from time to time. Not much to go on here. Looks like the 
> test is trying to do some fancy HBase cluster initialization order on top of 
> a mini-cluster. Failure seems related to trying to start the HBase master 
> before HDFS is fully initialized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to