Dmitry Konstantinov created CURATOR-422:
-------------------------------------------

             Summary: PathChildrenCache is not tolerant to failed connection to 
ZK on startup
                 Key: CURATOR-422
                 URL: https://issues.apache.org/jira/browse/CURATOR-422
             Project: Apache Curator
          Issue Type: Bug
          Components: Recipes
    Affects Versions: 2.12.0
            Reporter: Dmitry Konstantinov


If PathChildrenCache is started when Zookeeper is not available for a quite 
long time (to exceed operations retries) and parent node did not exist - when 
the connection to Zookeeper is resumed PathChildrenCache does not watch for 
changes anymore.
Root cause: PathChildrenCache uses EnsureContainers which has the following 
logic:
{code:java}
private synchronized void internalEnsure() throws Exception
    {
        if ( ensureNeeded.compareAndSet(true, false) )
        {
            client.createContainers(path);
        }
    }
{code}
This logic is not aware about operation result, even if client.createContainers 
throws an exception and the nodes are not created EnsureContainers next time 
will not try to do it.
Example of the exception:
{code}
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /test
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1073)
        at org.apache.curator.utils.ZKPaths.mkdirs(ZKPaths.java:274)
        at 
org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:199)
        at 
org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:193)
        at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
        at 
org.apache.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:190)
        at 
org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:175)
        at 
org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:32)
        at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.createContainers(CuratorFrameworkImpl.java:194)
        at 
org.apache.curator.framework.EnsureContainers.internalEnsure(EnsureContainers.java:61)
        at 
org.apache.curator.framework.EnsureContainers.ensure(EnsureContainers.java:53)
        at 
org.apache.curator.framework.recipes.cache.PathChildrenCache.ensurePath(PathChildrenCache.java:576)
        at 
org.apache.curator.framework.recipes.cache.PathChildrenCache.refresh(PathChildrenCache.java:490)
        at 
org.apache.curator.framework.recipes.cache.RefreshOperation.invoke(RefreshOperation.java:35)
        at 
org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:773)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
{code}

As a result the watcher registered in 
org.apache.curator.framework.recipes.cache.PathChildrenCache#refresh is not 
triggered.

Test to reproduce:
{code:java}
@Test
public void test() throws Exception {
    TestingServer zkTestServer = new TestingServer(2181, false);

    CuratorFramework curatorFramework = CuratorFrameworkFactory.newClient(
            zkTestServer.getConnectString(),
            5000,
            1000,
            new RetryOneTime(100)
    );
    curatorFramework.start();
    PathChildrenCache cache = new PathChildrenCache(curatorFramework, "/test", 
true);
    cache.start(PathChildrenCache.StartMode.POST_INITIALIZED_EVENT);

    Thread.sleep(5000);

    zkTestServer.start();
    
curatorFramework.create().creatingParentContainersIfNeeded().forPath("/test/example");

    while(true) {
        Thread.sleep(1000);
        System.out.println(cache.getCurrentData());
    }
}
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to