[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15309252#comment-15309252
 ] 

Michael Han commented on ZOOKEEPER-2152:
----------------------------------------

The root cause of failures in testMigrateOrNot() is identified as follows:
I think an invariant we assumed always hold in our reconfiguration tests is the 
state of current server that client connects to is always uniquely determined 
by a call to cycleNextServer implemented in our tests (which calls 
zoo_cycle_next_server.). This assumption is not true because cycleNextServer is 
not the only place where zoo_cycle_next_server gets called: zookeeper_interest 
in the client IO thread, because our reconfiguration client tests does not 
actually have a real server set up, so client would end up recycling servers in 
each reconnect attempt:
{code}
// No need to delay -- grab the next server and attempt connection
zoo_cycle_next_server(zh);
{code}

The end result of calls of zoo_cycle_next_server from both our tests and ZK IO 
thread will randomize the state of client's currently connected server. Since 
this state is the key assumption of most of our tests, they will fail randomly, 
or pass, depends on timing. This also explains why MT tests failed more often 
than ST tests. 

I'll prepare a patch - my current idea is that we could try to set zh->delay in 
our tests which effectively disable the zoo_cycle_next_server in ZK IO thread.


> Intermittent failure in TestReconfig.cc
> ---------------------------------------
>
>                 Key: ZOOKEEPER-2152
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2152
>             Project: ZooKeeper
>          Issue Type: Sub-task
>          Components: c client
>            Reporter: Michi Mutsuzaki
>            Assignee: Michael Han
>              Labels: reconfiguration
>             Fix For: 3.6.0
>
>
> I'm seeing this failure in the c client test once in a while:
> {noformat}
> [exec] 
> /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/trunk/src/c/tests/TestReconfig.cc:474:
>  Assertion: assertion failed [Expression: found != string::npos, 
> 10.10.10.4:2004 not in newComing list]
> {noformat}
> https://builds.apache.org/job/ZooKeeper-trunk/2640/console



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to