[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15308625#comment-15308625
 ] 

Michael Han commented on ZOOKEEPER-2152:
----------------------------------------

So far I've been trying to reproduce these test failures and capture the 
context of the failures. Here are my findings.

# testcycleNextServer()
This test case always succeed as expected. I have one question here:
{code}
for (int i = 0; i < 10; i++)
{
   string next = client.cycleNextServer();
}
{code}
IIUC here we intend to test cycleNextServer. If so, should we add a validation 
against the 'next' variable here after each call of cycleNextServer? I am not 
sure why there was no validation here as the cycleNextServer in this context 
deterministically produces ordered results. [~marshall] Do you have any comment 
about this?

# testMigrateOrNot()
This is the first test case that I can deterministically reproduce failures 
fairly often. The failure occurs on this line:
{code} 
// Ensemble size decreasing, my server is NOT in the new list
client.setServersAndVerifyReconfig(createHostList(2), true); 
{code}
It fails because the currently connected server before this call was 
'10.10.10.2', instead of '10.10.10.3', which is what we expect. Since the new 
server list created by createHostList(2) already contains '10.10.10.2', there 
would not be a reconfiguration, which is not what we expected in the second 
parameter (true means we expect a reconfiguration.).

We can use a simplified test case to demonstrate the issue:
{code}
void testCurrentConnectedServer() {
  const string initial_hosts = createHostList(4); // 2004..2001
  Client &client = createClient(initial_hosts, "10.10.10.3");     
 
  // Ensemble size decreasing, my server is in the new list
  client.setServersAndVerifyReconfig(createHostList(3), false);
  const string expectedServer = "10.10.10.3:2003";
  CPPUNIT_ASSERT_MESSAGE("Current connected server should be " + expectedServer,
    client.getServer() == expectedServer);
}
{code}

I'll look more into the call stack to identify the root cause. Meanwhile if 
anyone is interested on reproduce this using mac, I have a simple xcode app [1] 
that replicates the exact test logic in TestReconfig. For some reasons, I am 
able to reproduce the test failure much more easily in xcode than running as 
part of CPP unit test (on Ubuntu.).

[1]https://github.com/hanm/zk-tools/tree/dynamic_reconfig

> Intermittent failure in TestReconfig.cc
> ---------------------------------------
>
>                 Key: ZOOKEEPER-2152
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2152
>             Project: ZooKeeper
>          Issue Type: Sub-task
>          Components: c client
>            Reporter: Michi Mutsuzaki
>            Assignee: Michael Han
>              Labels: reconfiguration
>             Fix For: 3.6.0
>
>
> I'm seeing this failure in the c client test once in a while:
> {noformat}
> [exec] 
> /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/trunk/src/c/tests/TestReconfig.cc:474:
>  Assertion: assertion failed [Expression: found != string::npos, 
> 10.10.10.4:2004 not in newComing list]
> {noformat}
> https://builds.apache.org/job/ZooKeeper-trunk/2640/console



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to