[jira] [Created] (CURATOR-559) Inconsistent ZK timeouts

Grant Digby (Jira) Mon, 17 Feb 2020 03:11:08 -0800

Grant Digby created CURATOR-559:
-----------------------------------

             Summary: Inconsistent ZK timeouts
                 Key: CURATOR-559
                 URL: https://issues.apache.org/jira/browse/CURATOR-559
             Project: Apache Curator
          Issue Type: Bug
          Components: Framework
    Affects Versions: 4.2.0
            Reporter: Grant Digby



I've configured a reasonable timeout using BoundedExponentialBackoffRetry, and 
generally it works as I'd expect if ZK is down when I make a call like 
"create.forPath". But if ZK is unavailable when I call acquire on an 
InterProcessReadWriteLock, it takes far longer before it finally times out.

I call acquire which is wrapped in "RetryLoop.callWithRetry" and it goes onto 
call findProtectedNodeInForeground which is also wrapped in 
"RetryLoop.callWithRetry". If I've configured the 
BoundedExponentialBackoffRetry to retry 20 times, the inner retry tries 20 
times for every one of the 20 outer retry loops, so it retries 400 times.

 

This class recreates it, if you put break points at the commented sections and 
bring ZK down you can see the different times until it disconnects and the 
stack traces which I've included below.

 
{code:java}
public class GoCurator {
public static void main(String[] args) throws Exception {

    CuratorFramework cf = CuratorFrameworkFactory.newClient(
            "localhost:2181",
            new BoundedExponentialBackoffRetry(200, 10000, 20)
    );
    cf.start();

    String root = "/myRoot";
    if(cf.checkExists().forPath(root) == null) {
        // Stacktrace A showing what happens if ZK is down for this call
        cf.create().forPath(root);
    }

    InterProcessReadWriteLock lcok = new InterProcessReadWriteLock(cf, 
"/grant/myLock");

    // See stacktrace B showing the nested re-try if ZK is down for this call
    lcok.readLock().acquire();

    lcok.readLock().release();

    System.out.println("done");
} {code}
 

Stacktrace A (if ZK is down when I'm calling create().forPath). This shows the 
single retry loop so it exist after the correct number of attempts:

 
{code:java}
 java.lang.Thread.State: WAITING
  at java.lang.Object.wait(Object.java:-1)
  at java.lang.Object.wait(Object.java:502)
  at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1499)
  at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1487)
  at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:2617)
  at 
org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:242)
  at 
org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:231)
  at 
org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64)
  at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100)
  at 
org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:228)
  at 
org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:219)
  at 
org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:41)
  at com.gebatech.curator.GoCurator.main(GoCurator.java:25) {code}
Stacktrace B (if ZK is down when I call 
InterProcessReadWriteLock#readLock#acquire). This shows the nested re-try loop 
so it doesn't exit until 20*20 attempts.

 
{code:java}
 java.lang.Thread.State: WAITING
  at sun.misc.Unsafe.park(Unsafe.java:-1)
  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
  at 
org.apache.curator.CuratorZookeeperClient.internalBlockUntilConnectedOrTimedOut(CuratorZookeeperClient.java:434)
  at 
org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:56)
  at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100)
  at 
org.apache.curator.framework.imps.CreateBuilderImpl.findProtectedNodeInForeground(CreateBuilderImpl.java:1239)
  at 
org.apache.curator.framework.imps.CreateBuilderImpl.access$1700(CreateBuilderImpl.java:51)
  at 
org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1167)
  at 
org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1156)
  at 
org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64)
  at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100)
  at 
org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:1153)
  at 
org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:607)
  at 
org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:597)
  at 
org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:575)
  at 
org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:51)
  at 
org.apache.curator.framework.recipes.locks.StandardLockInternalsDriver.createsTheLock(StandardLockInternalsDriver.java:54)
  at 
org.apache.curator.framework.recipes.locks.LockInternals.attemptLock(LockInternals.java:225)
  at 
org.apache.curator.framework.recipes.locks.InterProcessMutex.internalLock(InterProcessMutex.java:237)
  at 
org.apache.curator.framework.recipes.locks.InterProcessMutex.acquire(InterProcessMutex.java:89)
  at com.gebatech.curator.GoCurator.main(GoCurator.java:29) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CURATOR-559) Inconsistent ZK timeouts

Reply via email to