Amir Gur created CURATOR-194:
--------------------------------

             Summary: Deadlock ConnectionState.checkTimeouts
                 Key: CURATOR-194
                 URL: https://issues.apache.org/jira/browse/CURATOR-194
             Project: Apache Curator
          Issue Type: Bug
          Components: Client
    Affects Versions: 2.6.0
            Reporter: Amir Gur


When ConnectionState.checkTimeouts actually detects a timeout, it calls 'reset' 
 
which calls org.apache.zookeeper.ClientCnxn.close, which sends a 
ZooDefs.OpCode.closeSession request.
Then it waits on the packet, until SendThread calls 'notifyAll' on the packet.

At that time, SendThread is blocked because it tries to enter the synchronized 
method 'ConnectionState.checkTimeouts'.
So it will never notify the packet.

Here is the thread dump:

"job-scheduler_Worker-19-CheckHealthTask" prio=10 tid=0x00007f260609c000 
nid=0x5a97 in Object.wait() [0x00007f25723e1000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x0000000725fc0580> (a 
org.apache.zookeeper.ClientCnxn$Packet)
        at java.lang.Object.wait(Object.java:503)
        at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342)
        - locked <0x0000000725fc0580> (a org.apache.zookeeper.ClientCnxn$Packet)
        at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1314)
        at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:677)
        - locked <0x0000000723949c88> (a org.apache.zookeeper.ZooKeeper)
        at org.apache.curator.HandleHolder.internalClose(HandleHolder.java:139)
        at org.apache.curator.HandleHolder.closeAndReset(HandleHolder.java:77)
        at org.apache.curator.ConnectionState.reset(ConnectionState.java:218)
        - locked <0x000000071651de48> (a org.apache.curator.ConnectionState)
        at 
org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:194)
        - locked <0x000000071651de48> (a org.apache.curator.ConnectionState)
        at 
org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88)
        at 
org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
        at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:474)
        at 
org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:172)
        at 
org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:161)
        at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
        at 
org.apache.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:157)
        at 
org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:148)
        at 
org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:36)
        at 
com.alu.dal.zooKeeper.ZooKeeperSession.checkHealth(ZooKeeperSession.java:350)
        at 
com.alu.dal.zooKeeper.ZooKeeperSession.check(ZooKeeperSession.java:86)
        at 
com.alu.orchestration.cluster.ClusterInstanceServiceImpl.checkQuorum(ClusterInstanceServiceImpl.java:464)
        at 
com.alu.orchestration.cluster.ClusterInstanceServiceImpl.checkHealthState(ClusterInstanceServiceImpl.java:400)
        at 
com.alu.tasks.health.CheckHealthTaskImpl.doWork(CheckHealthTaskImpl.java:37)
        at 
com.alu.scheduler.JobSchedulerDetails$QuartzJob.executeInternal(JobSchedulerDetails.java:95)
        at 
org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:114)
        at org.quartz.core.JobRunShell.run(JobRunShell.java:216)
        at 
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)


"localhost-startStop-1-SendThread(11.1.1.11:2181)" daemon prio=10 
tid=0x00007f257c61a000 nid=0x7c3 waiting for monitor entry [0x00007f2562e65000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at 
org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:177)
        - waiting to lock <0x000000071651de48> (a 
org.apache.curator.ConnectionState)
        at 
org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88)
        at 
org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
        at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:793)
        at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.doSyncForSuspendedConnection(CuratorFrameworkImpl.java:668)
        at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$800(CuratorFrameworkImpl.java:58)
        at 
org.apache.curator.framework.imps.CuratorFrameworkImpl$7.retriesExhausted(CuratorFrameworkImpl.java:664)
        at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:683)
        at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:496)
        at 
org.apache.curator.framework.imps.BackgroundSyncImpl$1.processResult(BackgroundSyncImpl.java:50)
        at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:609)
        at 
org.apache.zookeeper.ClientCnxn$EventThread.queuePacket(ClientCnxn.java:478)
        - locked <0x0000000714935b18> (a 
java.util.concurrent.LinkedBlockingQueue)
        at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:630)
        at org.apache.zookeeper.ClientCnxn.conLossPacket(ClientCnxn.java:648)
        at org.apache.zookeeper.ClientCnxn.access$2400(ClientCnxn.java:85)
        at 
org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1194)
        - locked <0x000000071b205bf0> (a java.util.LinkedList)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1122)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to