Dmitry Konstantinov created CURATOR-422:
-------------------------------------------
Summary: PathChildrenCache is not tolerant to failed connection to
ZK on startup
Key: CURATOR-422
URL: https://issues.apache.org/jira/browse/CURATOR-422
Project: Apache Curator
Issue Type: Bug
Components: Recipes
Affects Versions: 2.12.0
Reporter: Dmitry Konstantinov
If PathChildrenCache is started when Zookeeper is not available for a quite
long time (to exceed operations retries) and parent node did not exist - when
the connection to Zookeeper is resumed PathChildrenCache does not watch for
changes anymore.
Root cause: PathChildrenCache uses EnsureContainers which has the following
logic:
{code:java}
private synchronized void internalEnsure() throws Exception
{
if ( ensureNeeded.compareAndSet(true, false) )
{
client.createContainers(path);
}
}
{code}
This logic is not aware about operation result, even if client.createContainers
throws an exception and the nodes are not created EnsureContainers next time
will not try to do it.
Example of the exception:
{code}
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode =
ConnectionLoss for /test
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1073)
at org.apache.curator.utils.ZKPaths.mkdirs(ZKPaths.java:274)
at
org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:199)
at
org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:193)
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
at
org.apache.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:190)
at
org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:175)
at
org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:32)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.createContainers(CuratorFrameworkImpl.java:194)
at
org.apache.curator.framework.EnsureContainers.internalEnsure(EnsureContainers.java:61)
at
org.apache.curator.framework.EnsureContainers.ensure(EnsureContainers.java:53)
at
org.apache.curator.framework.recipes.cache.PathChildrenCache.ensurePath(PathChildrenCache.java:576)
at
org.apache.curator.framework.recipes.cache.PathChildrenCache.refresh(PathChildrenCache.java:490)
at
org.apache.curator.framework.recipes.cache.RefreshOperation.invoke(RefreshOperation.java:35)
at
org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:773)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}
As a result the watcher registered in
org.apache.curator.framework.recipes.cache.PathChildrenCache#refresh is not
triggered.
Test to reproduce:
{code:java}
@Test
public void test() throws Exception {
TestingServer zkTestServer = new TestingServer(2181, false);
CuratorFramework curatorFramework = CuratorFrameworkFactory.newClient(
zkTestServer.getConnectString(),
5000,
1000,
new RetryOneTime(100)
);
curatorFramework.start();
PathChildrenCache cache = new PathChildrenCache(curatorFramework, "/test",
true);
cache.start(PathChildrenCache.StartMode.POST_INITIALIZED_EVENT);
Thread.sleep(5000);
zkTestServer.start();
curatorFramework.create().creatingParentContainersIfNeeded().forPath("/test/example");
while(true) {
Thread.sleep(1000);
System.out.println(cache.getCurrentData());
}
}
{code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)