[jira] [Commented] (HBASE-23808) [Flakey Test] TestMasterShutdown#testMasterShutdownBeforeStartingAnyRegionServer

Michael Stack (Jira) Wed, 25 Mar 2020 22:50:11 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-23808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067372#comment-17067372
 ]


Michael Stack commented on HBASE-23808:
---------------------------------------

I just saw this bit in TestMasterShutdown:

{code}
// Switching to master registry exacerbated a race in the master bootstrap that 
can result
        // in a lost shutdown command (HBASE-8422, HBASE-23836). The race is 
essentially because
        // the server manager in HMaster is not initialized by the time 
shutdown() RPC (below) is
        // made to the master. The suspected reason as to why it was uncommon 
before HBASE-18095
        // is because the connection creation with ZK registry is so slow that 
by then the server
        // manager is usually init'ed in time for the RPC to be made. For now, 
adding an explicit
        // wait() in the test, waiting for the server manager to become 
available.
        final long timeout = TimeUnit.MINUTES.toMillis(10);
        assertNotEquals("timeout waiting for server manager to become 
available.",
          -1, Waiter.waitFor(htu.getConfiguration(), timeout,
            () -> masterThread.getMaster().getServerManager() != null...
{code}

... which probably explains the 'hang' I see.

In RSProcedureDispatcher#start, we were getting NPEs... which correlated to the 
test fails. Above, I added catch and returning failed start which seemed to be 
because Master had already been stopped. Made a subtask adding more debug for 
now while tests run over night.

> [Flakey Test] 
> TestMasterShutdown#testMasterShutdownBeforeStartingAnyRegionServer
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-23808
>                 URL: https://issues.apache.org/jira/browse/HBASE-23808
>             Project: HBase
>          Issue Type: Test
>          Components: test
>    Affects Versions: 2.3.0
>            Reporter: Nick Dimiduk
>            Assignee: Nick Dimiduk
>            Priority: Major
>             Fix For: 3.0.0, 2.3.0, 2.2.4
>
>         Attachments: 
> TEST-org.apache.hadoop.hbase.master.TestMasterShutdown.xml
>
>
> Reproduces locally from time to time. Not much to go on here. Looks like the 
> test is trying to do some fancy HBase cluster initialization order on top of 
> a mini-cluster. Failure seems related to trying to start the HBase master 
> before HDFS is fully initialized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-23808) [Flakey Test] TestMasterShutdown#testMasterShutdownBeforeStartingAnyRegionServer

Reply via email to