[
https://issues.apache.org/jira/browse/CURATOR-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jordan Zimmerman resolved CURATOR-559.
--------------------------------------
Resolution: Fixed
For Curator 5.0.0 TestThreadLocalRetryLoop now uses a foreground Curator
operation so that the tests are reliable.
> Inconsistent ZK timeouts
> ------------------------
>
> Key: CURATOR-559
> URL: https://issues.apache.org/jira/browse/CURATOR-559
> Project: Apache Curator
> Issue Type: Bug
> Components: Framework
> Affects Versions: 4.2.0, 4.3.0
> Reporter: Grant Digby
> Assignee: Jordan Zimmerman
> Priority: Blocker
> Fix For: 5.0.0, 4.3.0
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> I've configured a reasonable timeout using BoundedExponentialBackoffRetry,
> and generally it works as I'd expect if ZK is down when I make a call like
> "create.forPath". But if ZK is unavailable when I call acquire on an
> InterProcessReadWriteLock, it takes far longer before it finally times out.
> I call acquire which is wrapped in "RetryLoop.callWithRetry" and it goes onto
> call findProtectedNodeInForeground which is also wrapped in
> "RetryLoop.callWithRetry". If I've configured the
> BoundedExponentialBackoffRetry to retry 20 times, the inner retry tries 20
> times for every one of the 20 outer retry loops, so it retries 400 times.
>
> This class recreates it, if you put break points at the commented sections
> and bring ZK down you can see the different times until it disconnects and
> the stack traces which I've included below.
>
> {code:java}
> public class GoCurator {
> public static void main(String[] args) throws Exception {
> CuratorFramework cf = CuratorFrameworkFactory.newClient(
> "localhost:2181",
> new BoundedExponentialBackoffRetry(200, 10000, 20)
> );
> cf.start();
> String root = "/myRoot";
> if(cf.checkExists().forPath(root) == null) {
> // Stacktrace A showing what happens if ZK is down for this call
> cf.create().forPath(root);
> }
> InterProcessReadWriteLock lcok = new InterProcessReadWriteLock(cf,
> "/grant/myLock");
> // See stacktrace B showing the nested re-try if ZK is down for this call
> lcok.readLock().acquire();
> lcok.readLock().release();
> System.out.println("done");
> } {code}
>
> Stacktrace A (if ZK is down when I'm calling create().forPath). This shows
> the single retry loop so it exist after the correct number of attempts:
>
> {code:java}
> java.lang.Thread.State: WAITING
> at java.lang.Object.wait(Object.java:-1)
> at java.lang.Object.wait(Object.java:502)
> at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1499)
> at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1487)
> at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:2617)
> at
> org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:242)
> at
> org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:231)
> at
> org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64)
> at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100)
> at
> org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:228)
> at
> org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:219)
> at
> org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:41)
> at com.gebatech.curator.GoCurator.main(GoCurator.java:25) {code}
> Stacktrace B (if ZK is down when I call
> InterProcessReadWriteLock#readLock#acquire). This shows the nested re-try
> loop so it doesn't exit until 20*20 attempts.
>
> {code:java}
> java.lang.Thread.State: WAITING
> at sun.misc.Unsafe.park(Unsafe.java:-1)
> at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
> at
> org.apache.curator.CuratorZookeeperClient.internalBlockUntilConnectedOrTimedOut(CuratorZookeeperClient.java:434)
> at
> org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:56)
> at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100)
> at
> org.apache.curator.framework.imps.CreateBuilderImpl.findProtectedNodeInForeground(CreateBuilderImpl.java:1239)
> at
> org.apache.curator.framework.imps.CreateBuilderImpl.access$1700(CreateBuilderImpl.java:51)
> at
> org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1167)
> at
> org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1156)
> at
> org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64)
> at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100)
> at
> org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:1153)
> at
> org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:607)
> at
> org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:597)
> at
> org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:575)
> at
> org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:51)
> at
> org.apache.curator.framework.recipes.locks.StandardLockInternalsDriver.createsTheLock(StandardLockInternalsDriver.java:54)
> at
> org.apache.curator.framework.recipes.locks.LockInternals.attemptLock(LockInternals.java:225)
> at
> org.apache.curator.framework.recipes.locks.InterProcessMutex.internalLock(InterProcessMutex.java:237)
> at
> org.apache.curator.framework.recipes.locks.InterProcessMutex.acquire(InterProcessMutex.java:89)
> at com.gebatech.curator.GoCurator.main(GoCurator.java:29) {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)