[
https://issues.apache.org/jira/browse/HBASE-24656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148043#comment-17148043
]
Michael Stack commented on HBASE-24656:
---------------------------------------
Here is how the shutdown looks when all goes well:
{code}
2020-06-29 11:04:42,194 DEBUG [zk-event-processor-pool2-t1]
zookeeper.ZKWatcher(580): @Before-0x100797d56510001 connected
2020-06-29 11:04:42,196 DEBUG [Time-limited test] zookeeper.ZKUtil(358):
@Before-0x100797d56510001, quorum=127.0.0.1:63507, baseZNode=/hbase Set watcher
on existing znode=/hbase/rs
2020-06-29 11:04:42,197 DEBUG [Time-limited test] zookeeper.ZKUtil(358):
@Before-0x100797d56510001, quorum=127.0.0.1:63507, baseZNode=/hbase Set watcher
on existing znode=/hbase/splitWAL
2020-06-29 11:04:42,198 DEBUG [Time-limited test] zookeeper.ZKUtil(358):
@Before-0x100797d56510001, quorum=127.0.0.1:63507, baseZNode=/hbase Set watcher
on existing znode=/hbase/backup-masters
2020-06-29 11:04:42,198 DEBUG [Time-limited test] zookeeper.ZKUtil(358):
@Before-0x100797d56510001, quorum=127.0.0.1:63507, baseZNode=/hbase Set watcher
on existing znode=/hbase/table
2020-06-29 11:04:42,199 DEBUG [Time-limited test] zookeeper.ZKUtil(358):
@Before-0x100797d56510001, quorum=127.0.0.1:63507, baseZNode=/hbase Set watcher
on existing znode=/hbase/draining
2020-06-29 11:04:42,200 DEBUG [Time-limited test] zookeeper.ZKUtil(358):
@Before-0x100797d56510001, quorum=127.0.0.1:63507, baseZNode=/hbase Set watcher
on existing znode=/hbase/master-maintenance
2020-06-29 11:04:42,213 DEBUG [Time-limited test-EventThread]
zookeeper.ZKWatcher(555): @Before-0x100797d56510001, quorum=127.0.0.1:63507,
baseZNode=/hbase Received ZooKeeper Event, type=NodeDeleted,
state=SyncConnected, path=/hbase/master-maintenance
2020-06-29 11:04:42,213 DEBUG [Time-limited test-EventThread]
zookeeper.ZKWatcher(555): master:54310-0x100797d56510000,
quorum=127.0.0.1:63507, baseZNode=/hbase Received ZooKeeper Event,
type=NodeChildrenChanged, state=SyncConnected, path=/hbase
2020-06-29 11:04:42,213 DEBUG [Time-limited test-EventThread]
zookeeper.ZKWatcher(555): @Before-0x100797d56510001, quorum=127.0.0.1:63507,
baseZNode=/hbase Received ZooKeeper Event, type=NodeChildrenChanged,
state=SyncConnected, path=/hbase
2020-06-29 11:04:42,214 DEBUG [Time-limited test-EventThread]
zookeeper.ZKWatcher(555): @Before-0x100797d56510001, quorum=127.0.0.1:63507,
baseZNode=/hbase Received ZooKeeper Event, type=NodeDeleted,
state=SyncConnected, path=/hbase/draining
2020-06-29 11:04:42,214 DEBUG [zk-event-processor-pool1-t1]
zookeeper.ZKUtil(448): master:54310-0x100797d56510000, quorum=127.0.0.1:63507,
baseZNode=/hbase Unable to list children of znode /hbase because node does not
exist (not an error)
{code}
Here is sequence when test fails:
{code}
2020-06-29 15:21:07,638 DEBUG [zk-event-processor-pool2-t1]
zookeeper.ZKWatcher(580): @Before-0x100c741374b0001 connected
2020-06-29 15:21:07,642 DEBUG [Time-limited test] zookeeper.ZKUtil(358):
@Before-0x100c741374b0001, quorum=127.0.0.1:62960, baseZNode=/hbase Set watcher
on existing znode=/hbase/rs
2020-06-29 15:21:07,643 DEBUG [Time-limited test] zookeeper.ZKUtil(358):
@Before-0x100c741374b0001, quorum=127.0.0.1:62960, baseZNode=/hbase Set watcher
on existing znode=/hbase/splitWAL
2020-06-29 15:21:07,645 DEBUG [Time-limited test] zookeeper.ZKUtil(358):
@Before-0x100c741374b0001, quorum=127.0.0.1:62960, baseZNode=/hbase Set watcher
on existing znode=/hbase/backup-masters
2020-06-29 15:21:07,646 DEBUG [Time-limited test] zookeeper.ZKUtil(358):
@Before-0x100c741374b0001, quorum=127.0.0.1:62960, baseZNode=/hbase Set watcher
on existing znode=/hbase/table
2020-06-29 15:21:07,647 DEBUG [Time-limited test] zookeeper.ZKUtil(358):
@Before-0x100c741374b0001, quorum=127.0.0.1:62960, baseZNode=/hbase Set watcher
on existing znode=/hbase/draining
2020-06-29 15:21:07,649 DEBUG [Time-limited test] zookeeper.ZKUtil(358):
@Before-0x100c741374b0001, quorum=127.0.0.1:62960, baseZNode=/hbase Set watcher
on existing znode=/hbase/master-maintenance
2020-06-29 15:21:07,666 DEBUG [Time-limited test-EventThread]
zookeeper.ZKWatcher(555): @Before-0x100c741374b0001, quorum=127.0.0.1:62960,
baseZNode=/hbase Received ZooKeeper Event, type=NodeChildrenChanged,
state=SyncConnected, path=/hbase/backup-masters
2020-06-29 15:21:07,667 DEBUG [master/asf905:0:becomeActiveMaster]
zookeeper.ZKUtil(358): master:33965-0x100c741374b0000, quorum=127.0.0.1:62960,
baseZNode=/hbase Set watcher on existing
znode=/hbase/backup-masters/asf905.gq1.ygridcore.net,33965,1593444064742
2020-06-29 15:21:07,701 INFO [Time-limited test] zookeeper.ZKUtil(1809): multi
exception: org.apache.zookeeper.KeeperException$NotEmptyException:
KeeperErrorCode = Directory not empty; running operations sequentially
{code}
The backup master arrives after the delete started... The retry should help
here. Let me push.
> [Flakey Tests] branch-2 TestMasterNoCluster.testStopDuringStart
> ---------------------------------------------------------------
>
> Key: HBASE-24656
> URL: https://issues.apache.org/jira/browse/HBASE-24656
> Project: HBase
> Issue Type: Bug
> Reporter: Michael Stack
> Priority: Major
>
> org.apache.hadoop.hbase.master.TestMasterNoCluster.testStopDuringStart is
> (only) flakey on branch-2 currently. Fails here:
> Error Message
> KeeperErrorCode = Directory not empty for /hbase/backup-masters
> Stacktrace
> org.apache.zookeeper.KeeperException$NotEmptyException: KeeperErrorCode =
> Directory not empty for /hbase/backup-masters
> at
> org.apache.hadoop.hbase.master.TestMasterNoCluster.tearDown(TestMasterNoCluster.java:121)
> I can see the zk events in teardown as we purge children as part of cleanup.
> Can also see that the backup master registers later. Other than that, log is
> opaque on why the teardown is failing. This is just clean up so adding in
> retry to see if that helps.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)