Amir Gur created CURATOR-194:
--------------------------------
Summary: Deadlock ConnectionState.checkTimeouts
Key: CURATOR-194
URL: https://issues.apache.org/jira/browse/CURATOR-194
Project: Apache Curator
Issue Type: Bug
Components: Client
Affects Versions: 2.6.0
Reporter: Amir Gur
When ConnectionState.checkTimeouts actually detects a timeout, it calls 'reset'
which calls org.apache.zookeeper.ClientCnxn.close, which sends a
ZooDefs.OpCode.closeSession request.
Then it waits on the packet, until SendThread calls 'notifyAll' on the packet.
At that time, SendThread is blocked because it tries to enter the synchronized
method 'ConnectionState.checkTimeouts'.
So it will never notify the packet.
Here is the thread dump:
"job-scheduler_Worker-19-CheckHealthTask" prio=10 tid=0x00007f260609c000
nid=0x5a97 in Object.wait() [0x00007f25723e1000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x0000000725fc0580> (a
org.apache.zookeeper.ClientCnxn$Packet)
at java.lang.Object.wait(Object.java:503)
at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342)
- locked <0x0000000725fc0580> (a org.apache.zookeeper.ClientCnxn$Packet)
at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1314)
at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:677)
- locked <0x0000000723949c88> (a org.apache.zookeeper.ZooKeeper)
at org.apache.curator.HandleHolder.internalClose(HandleHolder.java:139)
at org.apache.curator.HandleHolder.closeAndReset(HandleHolder.java:77)
at org.apache.curator.ConnectionState.reset(ConnectionState.java:218)
- locked <0x000000071651de48> (a org.apache.curator.ConnectionState)
at
org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:194)
- locked <0x000000071651de48> (a org.apache.curator.ConnectionState)
at
org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88)
at
org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:474)
at
org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:172)
at
org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:161)
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
at
org.apache.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:157)
at
org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:148)
at
org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:36)
at
com.alu.dal.zooKeeper.ZooKeeperSession.checkHealth(ZooKeeperSession.java:350)
at
com.alu.dal.zooKeeper.ZooKeeperSession.check(ZooKeeperSession.java:86)
at
com.alu.orchestration.cluster.ClusterInstanceServiceImpl.checkQuorum(ClusterInstanceServiceImpl.java:464)
at
com.alu.orchestration.cluster.ClusterInstanceServiceImpl.checkHealthState(ClusterInstanceServiceImpl.java:400)
at
com.alu.tasks.health.CheckHealthTaskImpl.doWork(CheckHealthTaskImpl.java:37)
at
com.alu.scheduler.JobSchedulerDetails$QuartzJob.executeInternal(JobSchedulerDetails.java:95)
at
org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:114)
at org.quartz.core.JobRunShell.run(JobRunShell.java:216)
at
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)
"localhost-startStop-1-SendThread(11.1.1.11:2181)" daemon prio=10
tid=0x00007f257c61a000 nid=0x7c3 waiting for monitor entry [0x00007f2562e65000]
java.lang.Thread.State: BLOCKED (on object monitor)
at
org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:177)
- waiting to lock <0x000000071651de48> (a
org.apache.curator.ConnectionState)
at
org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88)
at
org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:793)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.doSyncForSuspendedConnection(CuratorFrameworkImpl.java:668)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$800(CuratorFrameworkImpl.java:58)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl$7.retriesExhausted(CuratorFrameworkImpl.java:664)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:683)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:496)
at
org.apache.curator.framework.imps.BackgroundSyncImpl$1.processResult(BackgroundSyncImpl.java:50)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:609)
at
org.apache.zookeeper.ClientCnxn$EventThread.queuePacket(ClientCnxn.java:478)
- locked <0x0000000714935b18> (a
java.util.concurrent.LinkedBlockingQueue)
at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:630)
at org.apache.zookeeper.ClientCnxn.conLossPacket(ClientCnxn.java:648)
at org.apache.zookeeper.ClientCnxn.access$2400(ClientCnxn.java:85)
at
org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1194)
- locked <0x000000071b205bf0> (a java.util.LinkedList)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1122)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)