[ 
https://issues.apache.org/jira/browse/KAFKA-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897864#comment-16897864
 ] 

Manikumar commented on KAFKA-8740:
----------------------------------

Some of the deadlock issues are fixed in newer version. Its recommended to move 
to  recent stable version.

> Threads causing circular deadlock 
> ----------------------------------
>
>                 Key: KAFKA-8740
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8740
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.11.0.0
>         Environment: OS: CentOS Linux release 7.5.1804 (Core)
> Kernel: 3.10.0-862.6.3.el7.x86_64
> Java Version:
> java version "1.8.0_66"
> Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)
> Hardware: virtual machine on nutanix hypervisor
>            Reporter: Sukumar Enuguri
>            Priority: Critical
>         Attachments: threaddumps-29Jul2019.tar, threaddumps-31Jul2019.tar
>
>
> Hi,
> We have a cluster with six nodes from time to time on one particular node we 
> see the connections to the broker turn into CLOSE_WAIT and when we took 
> thread dumps of the broker and analyzed them we found the threads are causing 
> circular deadlock.
>  
>  * Threads causing circular deadlock: *executor-Heartbeat* --> 
> *kafka-request-handler-7* --> *kafka-request-handler-1* --> 
> *kafka-request-handler-7*
> h2. {color:#cc3300}executor-Heartbeat{color}
> priority:5 - threadId:0x00007fa04c076000 - nativeId:0x3277b - nativeId 
> (decimal):206715 - state:BLOCKED
> stackTrace:
> java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> kafka.coordinator.group.GroupCoordinator.onExpireHeartbeat({color:#000080}GroupCoordinator.scala:777{color})
> - waiting to lock *<0x00000006d4d81288>* (a 
> kafka.coordinator.group.GroupMetadata)
> at 
> kafka.coordinator.group.DelayedHeartbeat.onExpiration({color:#000080}DelayedHeartbeat.scala:38{color})
> at 
> kafka.server.DelayedOperation.run({color:#000080}DelayedOperation.scala:113{color})
> at 
> java.util.concurrent.Executors$RunnableAdapter.call({color:#000080}Executors.java:511{color})
> at 
> java.util.concurrent.FutureTask.run({color:#000080}FutureTask.java:266{color})
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker({color:#000080}ThreadPoolExecutor.java:1142{color})
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run({color:#000080}ThreadPoolExecutor.java:617{color})
> at java.lang.Thread.run({color:#000080}Thread.java:745{color})
> Locked ownable synchronizers:
> - *<0x0000000727100b98>* (a java.util.concurrent.ThreadPoolExecutor$Worker)
> h2. {color:#cc3300}kafka-request-handler-7{color}
> priority:5 - threadId:0x00007fa0d580e000 - nativeId:0x1873f - nativeId 
> (decimal):100159 - state:BLOCKED
> stackTrace:
> java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> kafka.server.DelayedProduce.safeTryComplete({color:#000080}DelayedProduce.scala:75{color})
> - waiting to lock *<0x00000006d4d7a8e0>* (a 
> kafka.coordinator.group.GroupMetadata)
> at 
> kafka.server.DelayedOperationPurgatory$Watchers.tryCompleteWatched({color:#000080}DelayedOperation.scala:338{color})
> at 
> kafka.server.DelayedOperationPurgatory.checkAndComplete({color:#000080}DelayedOperation.scala:244{color})
> at 
> kafka.server.ReplicaManager.tryCompleteDelayedProduce({color:#000080}ReplicaManager.scala:250{color})
> at 
> kafka.cluster.Partition.tryCompleteDelayedRequests({color:#000080}Partition.scala:418{color})
> at 
> kafka.cluster.Partition.appendRecordsToLeader({color:#000080}Partition.scala:500{color})
> at 
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply({color:#000080}ReplicaManager.scala:546{color})
> at 
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply({color:#000080}ReplicaManager.scala:532{color})
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply({color:#000080}TraversableLike.scala:234{color})
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply({color:#000080}TraversableLike.scala:234{color})
> at 
> scala.collection.immutable.Map$Map1.foreach({color:#000080}Map.scala:116{color})
> at 
> scala.collection.TraversableLike$class.map({color:#000080}TraversableLike.scala:234{color})
> at 
> scala.collection.AbstractTraversable.map({color:#000080}Traversable.scala:104{color})
> at 
> kafka.server.ReplicaManager.appendToLocalLog({color:#000080}ReplicaManager.scala:532{color})
> at 
> kafka.server.ReplicaManager.appendRecords({color:#000080}ReplicaManager.scala:373{color})
> at 
> kafka.coordinator.group.GroupMetadataManager.appendForGroup({color:#000080}GroupMetadataManager.scala:239{color})
> at 
> kafka.coordinator.group.GroupMetadataManager.storeOffsets({color:#000080}GroupMetadataManager.scala:381{color})
> at 
> kafka.coordinator.group.GroupCoordinator.doCommitOffsets({color:#000080}GroupCoordinator.scala:465{color})
> - locked *<0x00000006d4d81288>* (a kafka.coordinator.group.GroupMetadata)
> at 
> kafka.coordinator.group.GroupCoordinator.handleCommitOffsets({color:#000080}GroupCoordinator.scala:428{color})
> at 
> kafka.server.KafkaApis.handleOffsetCommitRequest({color:#000080}KafkaApis.scala:356{color})
> at kafka.server.KafkaApis.handle({color:#000080}KafkaApis.scala:105{color})
> at 
> kafka.server.KafkaRequestHandler.run({color:#000080}KafkaRequestHandler.scala:66{color})
> at java.lang.Thread.run({color:#000080}Thread.java:745{color})
> Locked ownable synchronizers:
> - None
> h2. {color:#cc3300}kafka-request-handler-1{color}
> priority:5 - threadId:0x00007fa0d5803000 - nativeId:0x18739 - nativeId 
> (decimal):100153 - state:BLOCKED
> stackTrace:
> java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> kafka.server.DelayedProduce.safeTryComplete({color:#000080}DelayedProduce.scala:75{color})
> - waiting to lock *<0x00000006d4d81288>* (a 
> kafka.coordinator.group.GroupMetadata)
> at 
> kafka.server.DelayedOperationPurgatory$Watchers.tryCompleteWatched({color:#000080}DelayedOperation.scala:338{color})
> at 
> kafka.server.DelayedOperationPurgatory.checkAndComplete({color:#000080}DelayedOperation.scala:244{color})
> at 
> kafka.server.ReplicaManager.tryCompleteDelayedProduce({color:#000080}ReplicaManager.scala:250{color})
> at 
> kafka.cluster.Partition.tryCompleteDelayedRequests({color:#000080}Partition.scala:418{color})
> at 
> kafka.cluster.Partition.appendRecordsToLeader({color:#000080}Partition.scala:500{color})
> at 
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply({color:#000080}ReplicaManager.scala:546{color})
> at 
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply({color:#000080}ReplicaManager.scala:532{color})
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply({color:#000080}TraversableLike.scala:234{color})
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply({color:#000080}TraversableLike.scala:234{color})
> at 
> scala.collection.immutable.Map$Map1.foreach({color:#000080}Map.scala:116{color})
> at 
> scala.collection.TraversableLike$class.map({color:#000080}TraversableLike.scala:234{color})
> at 
> scala.collection.AbstractTraversable.map({color:#000080}Traversable.scala:104{color})
> at 
> kafka.server.ReplicaManager.appendToLocalLog({color:#000080}ReplicaManager.scala:532{color})
> at 
> kafka.server.ReplicaManager.appendRecords({color:#000080}ReplicaManager.scala:373{color})
> at 
> kafka.coordinator.group.GroupMetadataManager.appendForGroup({color:#000080}GroupMetadataManager.scala:239{color})
> at 
> kafka.coordinator.group.GroupMetadataManager.storeOffsets({color:#000080}GroupMetadataManager.scala:381{color})
> at 
> kafka.coordinator.group.GroupCoordinator.doCommitOffsets({color:#000080}GroupCoordinator.scala:465{color})
> - locked *<0x00000006d4d7a8e0>* (a kafka.coordinator.group.GroupMetadata)
> at 
> kafka.coordinator.group.GroupCoordinator.handleCommitOffsets({color:#000080}GroupCoordinator.scala:428{color})
> at 
> kafka.server.KafkaApis.handleOffsetCommitRequest({color:#000080}KafkaApis.scala:356{color})
> at kafka.server.KafkaApis.handle({color:#000080}KafkaApis.scala:105{color})
> at 
> kafka.server.KafkaRequestHandler.run({color:#000080}KafkaRequestHandler.scala:66{color})
> at java.lang.Thread.run({color:#000080}Thread.java:745{color})
> Locked ownable synchronizers:
> - None
> h2. {color:#cc3300}kafka-request-handler-7{color}
> priority:5 - threadId:0x00007fa0d580e000 - nativeId:0x1873f - nativeId 
> (decimal):100159 - state:BLOCKED
> stackTrace:
> java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> kafka.server.DelayedProduce.safeTryComplete({color:#000080}DelayedProduce.scala:75{color})
> - waiting to lock *<0x00000006d4d7a8e0>* (a 
> kafka.coordinator.group.GroupMetadata)
> at 
> kafka.server.DelayedOperationPurgatory$Watchers.tryCompleteWatched({color:#000080}DelayedOperation.scala:338{color})
> at 
> kafka.server.DelayedOperationPurgatory.checkAndComplete({color:#000080}DelayedOperation.scala:244{color})
> at 
> kafka.server.ReplicaManager.tryCompleteDelayedProduce({color:#000080}ReplicaManager.scala:250{color})
> at 
> kafka.cluster.Partition.tryCompleteDelayedRequests({color:#000080}Partition.scala:418{color})
> at 
> kafka.cluster.Partition.appendRecordsToLeader({color:#000080}Partition.scala:500{color})
> at 
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply({color:#000080}ReplicaManager.scala:546{color})
> at 
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply({color:#000080}ReplicaManager.scala:532{color})
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply({color:#000080}TraversableLike.scala:234{color})
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply({color:#000080}TraversableLike.scala:234{color})
> at 
> scala.collection.immutable.Map$Map1.foreach({color:#000080}Map.scala:116{color})
> at 
> scala.collection.TraversableLike$class.map({color:#000080}TraversableLike.scala:234{color})
> at 
> scala.collection.AbstractTraversable.map({color:#000080}Traversable.scala:104{color})
> at 
> kafka.server.ReplicaManager.appendToLocalLog({color:#000080}ReplicaManager.scala:532{color})
> at 
> kafka.server.ReplicaManager.appendRecords({color:#000080}ReplicaManager.scala:373{color})
> at 
> kafka.coordinator.group.GroupMetadataManager.appendForGroup({color:#000080}GroupMetadataManager.scala:239{color})
> at 
> kafka.coordinator.group.GroupMetadataManager.storeOffsets({color:#000080}GroupMetadataManager.scala:381{color})
> at 
> kafka.coordinator.group.GroupCoordinator.doCommitOffsets({color:#000080}GroupCoordinator.scala:465{color})
> - locked *<0x00000006d4d81288>* (a kafka.coordinator.group.GroupMetadata)
> at 
> kafka.coordinator.group.GroupCoordinator.handleCommitOffsets({color:#000080}GroupCoordinator.scala:428{color})
> at 
> kafka.server.KafkaApis.handleOffsetCommitRequest({color:#000080}KafkaApis.scala:356{color})
> at kafka.server.KafkaApis.handle({color:#000080}KafkaApis.scala:105{color})
> at 
> kafka.server.KafkaRequestHandler.run({color:#000080}KafkaRequestHandler.scala:66{color})
> at java.lang.Thread.run({color:#000080}Thread.java:745{color})
> Locked ownable synchronizers:
> - None
>  * Threads causing circular deadlock: *group-metadata-manager-0* --> 
> *kafka-request-handler-7* --> *kafka-request-handler-1* --> 
> *kafka-request-handler-7*
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to