[
https://issues.apache.org/jira/browse/KAFKA-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Neha Narkhede closed KAFKA-702.
-------------------------------
> Deadlock between request handler/processor threads
> --------------------------------------------------
>
> Key: KAFKA-702
> URL: https://issues.apache.org/jira/browse/KAFKA-702
> Project: Kafka
> Issue Type: Bug
> Components: network
> Affects Versions: 0.8
> Reporter: Joel Koshy
> Assignee: Jay Kreps
> Priority: Blocker
> Labels: bugs
> Fix For: 0.8
>
> Attachments: KAFKA-702-v1.patch
>
>
> We have seen this a couple of times in the past few days in a test cluster.
> The request handler and processor threads deadlock on the request/response
> queues bringing the server to a halt
> "kafka-processor-10251-7" prio=10 tid=0x00007f4a0c3c9800 nid=0x4c39 waiting
> on condition [0x00007f46f698e000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00007f48c9dd2698> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
> at
> java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:252)
> at kafka.network.RequestChannel.sendRequest(RequestChannel.scala:107)
> at kafka.network.Processor.read(SocketServer.scala:321)
> at kafka.network.Processor.run(SocketServer.scala:231)
> at java.lang.Thread.run(Thread.java:619)
> "kafka-request-handler-7" daemon prio=10 tid=0x00007f4a0c57f000 nid=0x4c47
> waiting on condition [0x00007f46f5b80000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00007f48c9dd6348> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
> at
> java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:252)
> at kafka.network.RequestChannel.sendResponse(RequestChannel.scala:112)
> at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:198)
> at kafka.server.KafkaApis.handle(KafkaApis.scala:58)
> at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:41)
> at java.lang.Thread.run(Thread.java:619)
> This is because there is a cycle in the wait-for graph of processor threads
> and request handler threads. If the request handling slows down on a busy
> server, the request queue fills up. All processor threads quickly block on
> adding incoming requests to the request queue. Due to this, those threads do
> not processes responses filling up their response queues. At this moment, the
> request handler threads start blocking on adding responses to the respective
> response queues. This can lead to a deadlock where every thread is holding a
> lock on one queue and asking a lock for the other queue. This brings the
> server to a halt where it accepts connections but every request gets timed
> out.
> One way to resolve this is by breaking the cycle in the wait-for graph of the
> request handler and processor threads. Instead of having the processor
> threads dispatching the responses, we can have one or more dedicated response
> handler threads that dequeue responses from the queue and write those on the
> socket. One downside of this approach is that now access to the selector will
> have to be synchronized.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira