virajjasani commented on a change in pull request #1684:
URL: https://github.com/apache/hbase/pull/1684#discussion_r422675456
##########
File path:
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterShutdown.java
##########
@@ -151,19 +156,47 @@ public void
testMasterShutdownBeforeStartingAnyRegionServer() throws Exception {
hbaseCluster = new LocalHBaseCluster(htu.getConfiguration(),
options.getNumMasters(),
options.getNumRegionServers(), options.getMasterClass(),
options.getRsClass());
final MasterThread masterThread = hbaseCluster.getMasters().get(0);
+
+ final CompletableFuture<Void> shutdownFuture =
CompletableFuture.runAsync(() -> {
+ // Switching to master registry exacerbated a race in the master
bootstrap that can result
+ // in a lost shutdown command (HBASE-8422, HBASE-23836). The race is
essentially because
+ // the server manager in HMaster is not initialized by the time
shutdown() RPC (below) is
+ // made to the master. The suspected reason as to why it was uncommon
before HBASE-18095
+ // is because the connection creation with ZK registry is so slow that
by then the server
+ // manager is usually init'ed in time for the RPC to be made. For now,
adding an explicit
+ // wait() in the test, waiting for the server manager to become
available.
+ final long timeout = TimeUnit.MINUTES.toMillis(10);
+ assertNotEquals("timeout waiting for server manager to become
available.", -1,
+ htu.waitFor(timeout, () ->
masterThread.getMaster().getServerManager() != null));
+
+ // Master has come up far enough that we can terminate it without
creating a zombie.
+ final long result = htu.waitFor(timeout, 1000, () -> {
+ final Configuration conf =
createResponsiveZkConfig(htu.getConfiguration());
+ LOG.debug("Attempting to establish connection.");
+ final CompletableFuture<AsyncConnection> connFuture =
+ ConnectionFactory.createAsyncConnection(conf);
+ try (final AsyncConnection conn = connFuture.join()) {
Review comment:
Hmm since it is already in ForkJoin, this doesn't matter much. I don't
have strong opinion, but it's just that we are directly using AsyncConnection
and AsyncAdmin rather than via Connection and Admin interfaces. But yes even
htu.getConnection() should work after putting ZK recovery configs in htu
directly.
However, since it's not much of a diff, do you want Addendum now, or let's
wait for some time and see reports and then we can add it maybe in a week?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]