GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/1173
[FLINK-2616] [test-stability] Fixes ZooKeeperLeaderElectionTest.testMultipleLeaders by introducing second retrieval service I think this time I've figured out why the `ZooKeeperLeaderElectionTest.testMultipleLeaders` test case sometimes failed. Apparently, Curator's `NodeCache` does not receive all node changes. If for example, the node's data has been changed twice, the `NodeCache` eventually sees only the most recent state. This led to problems in the test case, because the `LeaderRetrievalListener` did not see the firstly changed leader address. The `ZooKeeperLeaderRetrievalService` only notifies the `LeaderRetrievalListener` about a new leader if the read address from the ZooKeeper nodes is different to the last read information. If the node cache misses the firstly changed leader address and only sees the overwritten (corrected) address, then it won't notify the listener, because for him nothing has changed. Therefore, the test failed because it waited for a changing leader address. I resolved the test failure by using a second `LeaderRetrievalService` which is just started after the faulty leader information has been written to ZooKeeper. That way we can be sure that it will see any leader information, the false or the corrected data, for the first time. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink fixZooKeeperLeaderElectionTest2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1173.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1173 ---- commit 573c3fac5f36df38f794b3a44f0573ff61c63ce4 Author: Till Rohrmann <trohrm...@apache.org> Date: 2015-09-23T12:34:38Z [FLINK-2616] [test-stability] Fixes ZooKeeperLeaderElectionTest.testMultipleLeaders by introducing a second retrieval service to retrieve the leader address after the faulty address has been written. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---