[
https://issues.apache.org/jira/browse/ZOOKEEPER-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414284#comment-15414284
]
Michael Han commented on ZOOKEEPER-2508:
----------------------------------------
Thanks for the quick update [~arshad.mohammad]. New tests are very clean with
the abstraction, comparing the old tests where each test reinvented the wheel.
I have a few more comments on 04.patch:
* In "ObserverTest.testObserver" two checks are removed:
{code}
Assert.assertNotSame("Client is still connected to non-quorate cluster",
KeeperState.SyncConnected,lastEvent.getState());
Assert.assertTrue("Client didn't reconnect to quorate ensemble (state was" +
lastEvent.getState() + ")",
(KeeperState.SyncConnected==lastEvent.getState() ||
KeeperState.Expired==lastEvent.getState()));
{code}
This is a functional change to the test case because the new abstraction only
checks initially when the ZK handle was created. The removed checks seems very
reasonable to have from a functional testing perspective, so I think this
should be fixed. If abstracting these checks is cumbersome maybe we can just
leave this test case as it is without using the new createZKClient introduced
in this patch.
* For 'public static ZooKeeper createZKClient(String cxnString, int
sessionTimeout)', maybe using Assert explicitly is better than throwing a
TimeoutException? For example:
{code}
public static ZooKeeper createZKClient(String cxnString, int sessionTimeout)
throws Exception {
CountdownWatcher watcher = new CountdownWatcher();
ZooKeeper zk = new ZooKeeper(cxnString, sessionTimeout, watcher);
try {
watcher.waitForConnected(CONNECTION_TIMEOUT);
} catch (InterruptedException e) {
// Ignore interrupt.. not interested.
} catch (TimeoutException e) {
Assert.fail("ZooKeeper client can not connect to " + cxnString);
}
return zk;
}
{code}
By using Assert we make the contract clear that we always expect a functional
connected ZK client before proceeding to execute rest of tests; it also makes
the test report easier to read, in case such assumption is violated as we would
know what exact expectation was failed (instead of digging into code and
figured out why).
* I am thinking we should make the 'CONNECTION_TIMEOUT' a parameter in the
interface of 'createZKClient', so each test case can tune the timeout based on
their needs, given that there are a lot of variations of timeouts per test.
A side note regarding test timeout variation among tests: some requires as
small as 5 seconds while others require 10, 20, 30 or more. This might be a by
design but could also be an optimization opportunity for us to normalize the
timeout value so tests could run faster. This however can be tackled in a
separate JIRA.
> Many ZooKeeper tests are flaky because they proceed with zk operation without
> connecting to ZooKeeper server.
> -------------------------------------------------------------------------------------------------------------
>
> Key: ZOOKEEPER-2508
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2508
> Project: ZooKeeper
> Issue Type: Bug
> Components: tests
> Reporter: Arshad Mohammad
> Assignee: Arshad Mohammad
> Fix For: 3.5.3
>
> Attachments: ZOOKEEPER-2508-01.patch, ZOOKEEPER-2508-02.patch,
> ZOOKEEPER-2508-03.patch, ZOOKEEPER-2508-04.patch
>
>
> Many ZooKeeper tests are flaky because they proceed with zk operation without
> connecting to ZooKeeper server.
> Recently in our build
> {{org.apache.zookeeper.server.ZooKeeperServerMainTest.testStandalone()}}
> failed.
> After analyzing we found that it is failed because it is not waiting for
> ZooKeeper client get connected to server. In this case normally zookeeper
> client gets connected immediately but if not connected immediately the test
> case is bound to fail.
> Not only ZooKeeperServerMainTest but there are many other classes which have
> such test cases. This jira is to address all those test cases.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)