Gerd Behrmann created CURATOR-328:
-------------------------------------

             Summary: PathChildrenCache fails silently if server is unavailable 
for sufficient time when client starts
                 Key: CURATOR-328
                 URL: https://issues.apache.org/jira/browse/CURATOR-328
             Project: Apache Curator
          Issue Type: Bug
          Components: Recipes
    Affects Versions: 2.10.0
            Reporter: Gerd Behrmann


When initializing the PathChildrenCache, if the curator client is not yet 
connected to the ZooKeeper server (e.g. the server is down or the network 
connection is unavailable), then the internal initialization of the cache will 
eventually fail silently and the cache stays empty even after the client 
finally connects to the server and the path is populated with znodes.

The following unit test demonstrates the problem (the unit test is ugly as the 
problem depends on timing, but it suffices to demonstrate the issue):

{code:java}
    @Test
    public void pathChildrenCacheTest() throws Exception
    {
        TestingServer server = new TestingServer(false);

        Timing timing = new Timing();
        CuratorFramework client = CuratorFrameworkFactory.newClient(
                server.getConnectString(), timing.session(), 
timing.connection(), new ExponentialBackoffRetry(1000, 3));
        try {
            new Thread() {
                @Override
                public void run()
                {
                    try {
                        Thread.sleep(60000);
                        server.start();
                    } catch (Exception e) {
                        e.printStackTrace();
                    }
                }
            }.start();

            client.start();

            PathChildrenCache cache = new PathChildrenCache(client, "/", true);
            cache.start();

            client.blockUntilConnected();

            client.create().creatingParentContainersIfNeeded().forPath("/baz", 
new byte[] {1,2,3});

            assertNotNull("/baz does not exist", 
client.checkExists().forPath("/baz"));

            /* Ugly hack for this test to ensure the cache got time to update 
itself. */
            Thread.sleep(1000);

            assertNotNull("cache doesn't see /baz", 
cache.getCurrentData("/baz"));
        } finally {
            client.close();
            server.stop();
        }
    }
{code}

Here the server startup is delayed until some point after the curator client 
was started and after the recipe has been created. Eventually the server starts 
and the path is populated with data - some time is given for the cache to 
update itself, yet no data is visible: The second assertion fails.

If the startup time is reduced to - say - 20 seconds, the test passes.

If the client is allowed to first connect to the server before creating the 
recipe and then disconnect and reconnect after creating the recipe, then the 
test passes too.

I tracked down the problem to the state change listener of the recipe: If the 
connection to the server is down for long enough, the refresh call during the 
background initialization will eventually fail (ensurePath throws an 
exception). This isn't a problem as the recipe has a state change listener, so 
it gets notified when the client eventually connects to the server. The 
handleStateChange method however doesn't react to a CONNECTED event - only to a 
RECONNECTED event. Thus if the client has been connected to the server in the 
past, everything works, however if this is the first time it connects, the 
recipe will not react to the event and thus not refresh itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to