GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/flink/pull/1173

    [FLINK-2616] [test-stability] Fixes 
ZooKeeperLeaderElectionTest.testMultipleLeaders by introducing second retrieval 
service

    I think this time I've figured out why the 
`ZooKeeperLeaderElectionTest.testMultipleLeaders` test case sometimes failed. 
Apparently, Curator's `NodeCache` does not receive all node changes. If for 
example, the node's data has been changed twice, the `NodeCache` eventually 
sees only the most recent state. This led to problems in the test case, because 
the `LeaderRetrievalListener` did not see the firstly changed leader address. 
The `ZooKeeperLeaderRetrievalService` only notifies the 
`LeaderRetrievalListener` about a new leader if the read address from the 
ZooKeeper nodes is different to the last read information. If the node cache 
misses the firstly changed leader address and only sees the overwritten 
(corrected) address, then it won't notify the listener, because for him nothing 
has changed. Therefore, the test failed because it waited for a changing leader 
address.
    
    I resolved the test failure by using a second `LeaderRetrievalService` 
which is just started after the faulty leader information has been written to 
ZooKeeper. That way we can be sure that it will see any leader information, the 
false or the corrected data, for the first time.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink 
fixZooKeeperLeaderElectionTest2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1173.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1173
    
----
commit 573c3fac5f36df38f794b3a44f0573ff61c63ce4
Author: Till Rohrmann <trohrm...@apache.org>
Date:   2015-09-23T12:34:38Z

    [FLINK-2616] [test-stability] Fixes 
ZooKeeperLeaderElectionTest.testMultipleLeaders by introducing a second 
retrieval service to retrieve the leader address after the faulty address has 
been written.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to