[jira] [Commented] (HBASE-20362) TestMasterShutdown.testMasterShutdownBeforeStartingAnyRegionServer is flaky

Duo Zhang (JIRA) Sun, 08 Apr 2018 00:19:36 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-20362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429653#comment-16429653
 ]


Duo Zhang commented on HBASE-20362:
-----------------------------------

OK, we have enough errors in the logs. I think the problem is that, we do hot 
have any region servers in this test, so we will call master.stop immediately 
in serverManager.shutdownCluster, and then HRegionServer.run will start closing 
the related resources, such as zookeeper connection, the rpc server, and so on. 
If it runs quick enough, which stops the rpc server before we send the request 
back, then the admin.shutdown call will fail, and we will not call 
cluster.waitOnMaster and cause the test to fail.

I do not think this is a big deal, so I prefer to define this as a testcase 
problem. In fact, a shutdown call which ends with connection refused is 
expected since the server shuts itself down...

See the shutdown command for redis

https://redis.io/commands/shutdown

{noformat}
Return value
Simple string reply on error. On success nothing is returned since the server 
quits and the connection is closed.
{noformat}

So I think here we should move the cluster.waitOnMaster(MASTER_INDEX); out of 
the try block. And also, add comments to Admin.shutdown to indicate that you 
may not get a response since the server has already shut itself down, this is 
expected.

> TestMasterShutdown.testMasterShutdownBeforeStartingAnyRegionServer is flaky
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-20362
>                 URL: https://issues.apache.org/jira/browse/HBASE-20362
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Duo Zhang
>            Priority: Major
>
> {code}
>     Thread shutdownThread = new Thread("Shutdown-Thread") {
>       @Override
>       public void run() {
>         LOG.info("Before call to shutdown master");
>         try {
>           try (Connection connection =
>               ConnectionFactory.createConnection(util.getConfiguration())) {
>             try (Admin admin = connection.getAdmin()) {
>               admin.shutdown();
>             }
>           }
>           LOG.info("After call to shutdown master");
>           cluster.waitOnMaster(MASTER_INDEX);
>         } catch (Exception e) {
>         }
>       }
>     };
> {code}
> https://builds.apache.org/job/HBASE-Flaky-Tests/28970/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.master.TestMasterShutdown-output.txt
> In the output for a failed running, we only have 'Before call to shutdown 
> master' but no 'After call to shutdown master', so I think there must be 
> something wrong when calling admin.shutdown, but in the catch block below we 
> just ignore the exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20362) TestMasterShutdown.testMasterShutdownBeforeStartingAnyRegionServer is flaky

Reply via email to