[
https://issues.apache.org/jira/browse/ZOOKEEPER-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15308625#comment-15308625
]
Michael Han commented on ZOOKEEPER-2152:
----------------------------------------
So far I've been trying to reproduce these test failures and capture the
context of the failures. Here are my findings.
# testcycleNextServer()
This test case always succeed as expected. I have one question here:
{code}
for (int i = 0; i < 10; i++)
{
string next = client.cycleNextServer();
}
{code}
IIUC here we intend to test cycleNextServer. If so, should we add a validation
against the 'next' variable here after each call of cycleNextServer? I am not
sure why there was no validation here as the cycleNextServer in this context
deterministically produces ordered results. [~marshall] Do you have any comment
about this?
# testMigrateOrNot()
This is the first test case that I can deterministically reproduce failures
fairly often. The failure occurs on this line:
{code}
// Ensemble size decreasing, my server is NOT in the new list
client.setServersAndVerifyReconfig(createHostList(2), true);
{code}
It fails because the currently connected server before this call was
'10.10.10.2', instead of '10.10.10.3', which is what we expect. Since the new
server list created by createHostList(2) already contains '10.10.10.2', there
would not be a reconfiguration, which is not what we expected in the second
parameter (true means we expect a reconfiguration.).
We can use a simplified test case to demonstrate the issue:
{code}
void testCurrentConnectedServer() {
const string initial_hosts = createHostList(4); // 2004..2001
Client &client = createClient(initial_hosts, "10.10.10.3");
// Ensemble size decreasing, my server is in the new list
client.setServersAndVerifyReconfig(createHostList(3), false);
const string expectedServer = "10.10.10.3:2003";
CPPUNIT_ASSERT_MESSAGE("Current connected server should be " + expectedServer,
client.getServer() == expectedServer);
}
{code}
I'll look more into the call stack to identify the root cause. Meanwhile if
anyone is interested on reproduce this using mac, I have a simple xcode app [1]
that replicates the exact test logic in TestReconfig. For some reasons, I am
able to reproduce the test failure much more easily in xcode than running as
part of CPP unit test (on Ubuntu.).
[1]https://github.com/hanm/zk-tools/tree/dynamic_reconfig
> Intermittent failure in TestReconfig.cc
> ---------------------------------------
>
> Key: ZOOKEEPER-2152
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2152
> Project: ZooKeeper
> Issue Type: Sub-task
> Components: c client
> Reporter: Michi Mutsuzaki
> Assignee: Michael Han
> Labels: reconfiguration
> Fix For: 3.6.0
>
>
> I'm seeing this failure in the c client test once in a while:
> {noformat}
> [exec]
> /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/trunk/src/c/tests/TestReconfig.cc:474:
> Assertion: assertion failed [Expression: found != string::npos,
> 10.10.10.4:2004 not in newComing list]
> {noformat}
> https://builds.apache.org/job/ZooKeeper-trunk/2640/console
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)