[jira] [Commented] (HBASE-24656) [Flakey Tests] branch-2 TestMasterNoCluster.testStopDuringStart

Michael Stack (Jira) Mon, 29 Jun 2020 11:10:58 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-24656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148043#comment-17148043
 ]


Michael Stack commented on HBASE-24656:
---------------------------------------

Here is how the shutdown looks when all goes well:
{code}
 2020-06-29 11:04:42,194 DEBUG [zk-event-processor-pool2-t1] 
zookeeper.ZKWatcher(580): @Before-0x100797d56510001 connected
 2020-06-29 11:04:42,196 DEBUG [Time-limited test] zookeeper.ZKUtil(358): 
@Before-0x100797d56510001, quorum=127.0.0.1:63507, baseZNode=/hbase Set watcher 
on existing znode=/hbase/rs
 2020-06-29 11:04:42,197 DEBUG [Time-limited test] zookeeper.ZKUtil(358): 
@Before-0x100797d56510001, quorum=127.0.0.1:63507, baseZNode=/hbase Set watcher 
on existing znode=/hbase/splitWAL
 2020-06-29 11:04:42,198 DEBUG [Time-limited test] zookeeper.ZKUtil(358): 
@Before-0x100797d56510001, quorum=127.0.0.1:63507, baseZNode=/hbase Set watcher 
on existing znode=/hbase/backup-masters
 2020-06-29 11:04:42,198 DEBUG [Time-limited test] zookeeper.ZKUtil(358): 
@Before-0x100797d56510001, quorum=127.0.0.1:63507, baseZNode=/hbase Set watcher 
on existing znode=/hbase/table
 2020-06-29 11:04:42,199 DEBUG [Time-limited test] zookeeper.ZKUtil(358): 
@Before-0x100797d56510001, quorum=127.0.0.1:63507, baseZNode=/hbase Set watcher 
on existing znode=/hbase/draining
 2020-06-29 11:04:42,200 DEBUG [Time-limited test] zookeeper.ZKUtil(358): 
@Before-0x100797d56510001, quorum=127.0.0.1:63507, baseZNode=/hbase Set watcher 
on existing znode=/hbase/master-maintenance
 2020-06-29 11:04:42,213 DEBUG [Time-limited test-EventThread] 
zookeeper.ZKWatcher(555): @Before-0x100797d56510001, quorum=127.0.0.1:63507, 
baseZNode=/hbase Received ZooKeeper Event, type=NodeDeleted, 
state=SyncConnected, path=/hbase/master-maintenance
 2020-06-29 11:04:42,213 DEBUG [Time-limited test-EventThread] 
zookeeper.ZKWatcher(555): master:54310-0x100797d56510000, 
quorum=127.0.0.1:63507, baseZNode=/hbase Received ZooKeeper Event, 
type=NodeChildrenChanged, state=SyncConnected, path=/hbase
 2020-06-29 11:04:42,213 DEBUG [Time-limited test-EventThread] 
zookeeper.ZKWatcher(555): @Before-0x100797d56510001, quorum=127.0.0.1:63507, 
baseZNode=/hbase Received ZooKeeper Event, type=NodeChildrenChanged, 
state=SyncConnected, path=/hbase
 2020-06-29 11:04:42,214 DEBUG [Time-limited test-EventThread] 
zookeeper.ZKWatcher(555): @Before-0x100797d56510001, quorum=127.0.0.1:63507, 
baseZNode=/hbase Received ZooKeeper Event, type=NodeDeleted, 
state=SyncConnected, path=/hbase/draining
 2020-06-29 11:04:42,214 DEBUG [zk-event-processor-pool1-t1] 
zookeeper.ZKUtil(448): master:54310-0x100797d56510000, quorum=127.0.0.1:63507, 
baseZNode=/hbase Unable to list children of znode /hbase because node does not 
exist (not an error)
{code}

Here is sequence when test fails:
{code}
2020-06-29 15:21:07,638 DEBUG [zk-event-processor-pool2-t1] 
zookeeper.ZKWatcher(580): @Before-0x100c741374b0001 connected
2020-06-29 15:21:07,642 DEBUG [Time-limited test] zookeeper.ZKUtil(358): 
@Before-0x100c741374b0001, quorum=127.0.0.1:62960, baseZNode=/hbase Set watcher 
on existing znode=/hbase/rs
2020-06-29 15:21:07,643 DEBUG [Time-limited test] zookeeper.ZKUtil(358): 
@Before-0x100c741374b0001, quorum=127.0.0.1:62960, baseZNode=/hbase Set watcher 
on existing znode=/hbase/splitWAL
2020-06-29 15:21:07,645 DEBUG [Time-limited test] zookeeper.ZKUtil(358): 
@Before-0x100c741374b0001, quorum=127.0.0.1:62960, baseZNode=/hbase Set watcher 
on existing znode=/hbase/backup-masters
2020-06-29 15:21:07,646 DEBUG [Time-limited test] zookeeper.ZKUtil(358): 
@Before-0x100c741374b0001, quorum=127.0.0.1:62960, baseZNode=/hbase Set watcher 
on existing znode=/hbase/table
2020-06-29 15:21:07,647 DEBUG [Time-limited test] zookeeper.ZKUtil(358): 
@Before-0x100c741374b0001, quorum=127.0.0.1:62960, baseZNode=/hbase Set watcher 
on existing znode=/hbase/draining
2020-06-29 15:21:07,649 DEBUG [Time-limited test] zookeeper.ZKUtil(358): 
@Before-0x100c741374b0001, quorum=127.0.0.1:62960, baseZNode=/hbase Set watcher 
on existing znode=/hbase/master-maintenance
2020-06-29 15:21:07,666 DEBUG [Time-limited test-EventThread] 
zookeeper.ZKWatcher(555): @Before-0x100c741374b0001, quorum=127.0.0.1:62960, 
baseZNode=/hbase Received ZooKeeper Event, type=NodeChildrenChanged, 
state=SyncConnected, path=/hbase/backup-masters
2020-06-29 15:21:07,667 DEBUG [master/asf905:0:becomeActiveMaster] 
zookeeper.ZKUtil(358): master:33965-0x100c741374b0000, quorum=127.0.0.1:62960, 
baseZNode=/hbase Set watcher on existing 
znode=/hbase/backup-masters/asf905.gq1.ygridcore.net,33965,1593444064742
2020-06-29 15:21:07,701 INFO  [Time-limited test] zookeeper.ZKUtil(1809): multi 
exception: org.apache.zookeeper.KeeperException$NotEmptyException: 
KeeperErrorCode = Directory not empty; running operations sequentially 
{code}

The backup master arrives after the delete started... The retry should help 
here. Let me push.

> [Flakey Tests] branch-2 TestMasterNoCluster.testStopDuringStart
> ---------------------------------------------------------------
>
>                 Key: HBASE-24656
>                 URL: https://issues.apache.org/jira/browse/HBASE-24656
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Michael Stack
>            Priority: Major
>
> org.apache.hadoop.hbase.master.TestMasterNoCluster.testStopDuringStart is 
> (only) flakey on branch-2 currently. Fails here:
> Error Message
> KeeperErrorCode = Directory not empty for /hbase/backup-masters
> Stacktrace
> org.apache.zookeeper.KeeperException$NotEmptyException: KeeperErrorCode = 
> Directory not empty for /hbase/backup-masters
>       at 
> org.apache.hadoop.hbase.master.TestMasterNoCluster.tearDown(TestMasterNoCluster.java:121)
> I can see the zk events in teardown as we purge children as part of cleanup. 
> Can also see that the backup master registers later. Other than that, log is 
> opaque on why the teardown is failing. This is just clean up so adding in 
> retry to see if that helps.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-24656) [Flakey Tests] branch-2 TestMasterNoCluster.testStopDuringStart

Reply via email to