Sukumar Enuguri created KAFKA-8740: -------------------------------------- Summary: Threads causing circular deadlock Key: KAFKA-8740 URL: https://issues.apache.org/jira/browse/KAFKA-8740 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.11.0.0 Environment: OS: CentOS Linux release 7.5.1804 (Core) Kernel: 3.10.0-862.6.3.el7.x86_64 Java Version: java version "1.8.0_66" Java(TM) SE Runtime Environment (build 1.8.0_66-b17) Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode) Hardware: virtual machine on nutanix hypervisor
Reporter: Sukumar Enuguri Hi, We have a cluster with six nodes from time to time on one particular node we see the connections to the broker turn into CLOSE_WAIT and when we took thread dumps of the broker and analyzed them we found the threads are causing circular deadlock. * Threads causing circular deadlock: *executor-Heartbeat* --> *kafka-request-handler-7* --> *kafka-request-handler-1* --> *kafka-request-handler-7* h2. {color:#cc3300}executor-Heartbeat{color} priority:5 - threadId:0x00007fa04c076000 - nativeId:0x3277b - nativeId (decimal):206715 - state:BLOCKED stackTrace: java.lang.Thread.State: BLOCKED (on object monitor) at kafka.coordinator.group.GroupCoordinator.onExpireHeartbeat({color:#000080}GroupCoordinator.scala:777{color}) - waiting to lock *<0x00000006d4d81288>* (a kafka.coordinator.group.GroupMetadata) at kafka.coordinator.group.DelayedHeartbeat.onExpiration({color:#000080}DelayedHeartbeat.scala:38{color}) at kafka.server.DelayedOperation.run({color:#000080}DelayedOperation.scala:113{color}) at java.util.concurrent.Executors$RunnableAdapter.call({color:#000080}Executors.java:511{color}) at java.util.concurrent.FutureTask.run({color:#000080}FutureTask.java:266{color}) at java.util.concurrent.ThreadPoolExecutor.runWorker({color:#000080}ThreadPoolExecutor.java:1142{color}) at java.util.concurrent.ThreadPoolExecutor$Worker.run({color:#000080}ThreadPoolExecutor.java:617{color}) at java.lang.Thread.run({color:#000080}Thread.java:745{color}) Locked ownable synchronizers: - *<0x0000000727100b98>* (a java.util.concurrent.ThreadPoolExecutor$Worker) h2. {color:#cc3300}kafka-request-handler-7{color} priority:5 - threadId:0x00007fa0d580e000 - nativeId:0x1873f - nativeId (decimal):100159 - state:BLOCKED stackTrace: java.lang.Thread.State: BLOCKED (on object monitor) at kafka.server.DelayedProduce.safeTryComplete({color:#000080}DelayedProduce.scala:75{color}) - waiting to lock *<0x00000006d4d7a8e0>* (a kafka.coordinator.group.GroupMetadata) at kafka.server.DelayedOperationPurgatory$Watchers.tryCompleteWatched({color:#000080}DelayedOperation.scala:338{color}) at kafka.server.DelayedOperationPurgatory.checkAndComplete({color:#000080}DelayedOperation.scala:244{color}) at kafka.server.ReplicaManager.tryCompleteDelayedProduce({color:#000080}ReplicaManager.scala:250{color}) at kafka.cluster.Partition.tryCompleteDelayedRequests({color:#000080}Partition.scala:418{color}) at kafka.cluster.Partition.appendRecordsToLeader({color:#000080}Partition.scala:500{color}) at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply({color:#000080}ReplicaManager.scala:546{color}) at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply({color:#000080}ReplicaManager.scala:532{color}) at scala.collection.TraversableLike$$anonfun$map$1.apply({color:#000080}TraversableLike.scala:234{color}) at scala.collection.TraversableLike$$anonfun$map$1.apply({color:#000080}TraversableLike.scala:234{color}) at scala.collection.immutable.Map$Map1.foreach({color:#000080}Map.scala:116{color}) at scala.collection.TraversableLike$class.map({color:#000080}TraversableLike.scala:234{color}) at scala.collection.AbstractTraversable.map({color:#000080}Traversable.scala:104{color}) at kafka.server.ReplicaManager.appendToLocalLog({color:#000080}ReplicaManager.scala:532{color}) at kafka.server.ReplicaManager.appendRecords({color:#000080}ReplicaManager.scala:373{color}) at kafka.coordinator.group.GroupMetadataManager.appendForGroup({color:#000080}GroupMetadataManager.scala:239{color}) at kafka.coordinator.group.GroupMetadataManager.storeOffsets({color:#000080}GroupMetadataManager.scala:381{color}) at kafka.coordinator.group.GroupCoordinator.doCommitOffsets({color:#000080}GroupCoordinator.scala:465{color}) - locked *<0x00000006d4d81288>* (a kafka.coordinator.group.GroupMetadata) at kafka.coordinator.group.GroupCoordinator.handleCommitOffsets({color:#000080}GroupCoordinator.scala:428{color}) at kafka.server.KafkaApis.handleOffsetCommitRequest({color:#000080}KafkaApis.scala:356{color}) at kafka.server.KafkaApis.handle({color:#000080}KafkaApis.scala:105{color}) at kafka.server.KafkaRequestHandler.run({color:#000080}KafkaRequestHandler.scala:66{color}) at java.lang.Thread.run({color:#000080}Thread.java:745{color}) Locked ownable synchronizers: - None h2. {color:#cc3300}kafka-request-handler-1{color} priority:5 - threadId:0x00007fa0d5803000 - nativeId:0x18739 - nativeId (decimal):100153 - state:BLOCKED stackTrace: java.lang.Thread.State: BLOCKED (on object monitor) at kafka.server.DelayedProduce.safeTryComplete({color:#000080}DelayedProduce.scala:75{color}) - waiting to lock *<0x00000006d4d81288>* (a kafka.coordinator.group.GroupMetadata) at kafka.server.DelayedOperationPurgatory$Watchers.tryCompleteWatched({color:#000080}DelayedOperation.scala:338{color}) at kafka.server.DelayedOperationPurgatory.checkAndComplete({color:#000080}DelayedOperation.scala:244{color}) at kafka.server.ReplicaManager.tryCompleteDelayedProduce({color:#000080}ReplicaManager.scala:250{color}) at kafka.cluster.Partition.tryCompleteDelayedRequests({color:#000080}Partition.scala:418{color}) at kafka.cluster.Partition.appendRecordsToLeader({color:#000080}Partition.scala:500{color}) at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply({color:#000080}ReplicaManager.scala:546{color}) at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply({color:#000080}ReplicaManager.scala:532{color}) at scala.collection.TraversableLike$$anonfun$map$1.apply({color:#000080}TraversableLike.scala:234{color}) at scala.collection.TraversableLike$$anonfun$map$1.apply({color:#000080}TraversableLike.scala:234{color}) at scala.collection.immutable.Map$Map1.foreach({color:#000080}Map.scala:116{color}) at scala.collection.TraversableLike$class.map({color:#000080}TraversableLike.scala:234{color}) at scala.collection.AbstractTraversable.map({color:#000080}Traversable.scala:104{color}) at kafka.server.ReplicaManager.appendToLocalLog({color:#000080}ReplicaManager.scala:532{color}) at kafka.server.ReplicaManager.appendRecords({color:#000080}ReplicaManager.scala:373{color}) at kafka.coordinator.group.GroupMetadataManager.appendForGroup({color:#000080}GroupMetadataManager.scala:239{color}) at kafka.coordinator.group.GroupMetadataManager.storeOffsets({color:#000080}GroupMetadataManager.scala:381{color}) at kafka.coordinator.group.GroupCoordinator.doCommitOffsets({color:#000080}GroupCoordinator.scala:465{color}) - locked *<0x00000006d4d7a8e0>* (a kafka.coordinator.group.GroupMetadata) at kafka.coordinator.group.GroupCoordinator.handleCommitOffsets({color:#000080}GroupCoordinator.scala:428{color}) at kafka.server.KafkaApis.handleOffsetCommitRequest({color:#000080}KafkaApis.scala:356{color}) at kafka.server.KafkaApis.handle({color:#000080}KafkaApis.scala:105{color}) at kafka.server.KafkaRequestHandler.run({color:#000080}KafkaRequestHandler.scala:66{color}) at java.lang.Thread.run({color:#000080}Thread.java:745{color}) Locked ownable synchronizers: - None h2. {color:#cc3300}kafka-request-handler-7{color} priority:5 - threadId:0x00007fa0d580e000 - nativeId:0x1873f - nativeId (decimal):100159 - state:BLOCKED stackTrace: java.lang.Thread.State: BLOCKED (on object monitor) at kafka.server.DelayedProduce.safeTryComplete({color:#000080}DelayedProduce.scala:75{color}) - waiting to lock *<0x00000006d4d7a8e0>* (a kafka.coordinator.group.GroupMetadata) at kafka.server.DelayedOperationPurgatory$Watchers.tryCompleteWatched({color:#000080}DelayedOperation.scala:338{color}) at kafka.server.DelayedOperationPurgatory.checkAndComplete({color:#000080}DelayedOperation.scala:244{color}) at kafka.server.ReplicaManager.tryCompleteDelayedProduce({color:#000080}ReplicaManager.scala:250{color}) at kafka.cluster.Partition.tryCompleteDelayedRequests({color:#000080}Partition.scala:418{color}) at kafka.cluster.Partition.appendRecordsToLeader({color:#000080}Partition.scala:500{color}) at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply({color:#000080}ReplicaManager.scala:546{color}) at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply({color:#000080}ReplicaManager.scala:532{color}) at scala.collection.TraversableLike$$anonfun$map$1.apply({color:#000080}TraversableLike.scala:234{color}) at scala.collection.TraversableLike$$anonfun$map$1.apply({color:#000080}TraversableLike.scala:234{color}) at scala.collection.immutable.Map$Map1.foreach({color:#000080}Map.scala:116{color}) at scala.collection.TraversableLike$class.map({color:#000080}TraversableLike.scala:234{color}) at scala.collection.AbstractTraversable.map({color:#000080}Traversable.scala:104{color}) at kafka.server.ReplicaManager.appendToLocalLog({color:#000080}ReplicaManager.scala:532{color}) at kafka.server.ReplicaManager.appendRecords({color:#000080}ReplicaManager.scala:373{color}) at kafka.coordinator.group.GroupMetadataManager.appendForGroup({color:#000080}GroupMetadataManager.scala:239{color}) at kafka.coordinator.group.GroupMetadataManager.storeOffsets({color:#000080}GroupMetadataManager.scala:381{color}) at kafka.coordinator.group.GroupCoordinator.doCommitOffsets({color:#000080}GroupCoordinator.scala:465{color}) - locked *<0x00000006d4d81288>* (a kafka.coordinator.group.GroupMetadata) at kafka.coordinator.group.GroupCoordinator.handleCommitOffsets({color:#000080}GroupCoordinator.scala:428{color}) at kafka.server.KafkaApis.handleOffsetCommitRequest({color:#000080}KafkaApis.scala:356{color}) at kafka.server.KafkaApis.handle({color:#000080}KafkaApis.scala:105{color}) at kafka.server.KafkaRequestHandler.run({color:#000080}KafkaRequestHandler.scala:66{color}) at java.lang.Thread.run({color:#000080}Thread.java:745{color}) Locked ownable synchronizers: - None * Threads causing circular deadlock: *group-metadata-manager-0* --> *kafka-request-handler-7* --> *kafka-request-handler-1* --> *kafka-request-handler-7* -- This message was sent by Atlassian JIRA (v7.6.14#76016)