chia7712 commented on PR #18053: URL: https://github.com/apache/kafka/pull/18053#issuecomment-2520557508
ok, I run into a deadlock as I mentioned (https://github.com/apache/kafka/pull/17957#discussion_r1864856887) ``` "PersisterStateManager": waiting for ownable synchronizer 0x0000000081c1dc30, (a java.util.concurrent.locks.ReentrantLock$NonfairSync), which is held by "data-plane-kafka-request-handler-6" "data-plane-kafka-request-handler-6": waiting for ownable synchronizer 0x0000000081ce0058, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), which is held by "PersisterStateManager" ``` This can be easily reproduced by running two shared consumers concurrently. One consumer holds the write lock of the shared partition and waits for the lock of the delayed fetch while executing `completeDelayedShareFetchRequest`. ``` "PersisterStateManager": at jdk.internal.misc.Unsafe.park([email protected]/Native Method) - parking to wait for <0x0000000081c1dc30> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:221) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire([email protected]/AbstractQueuedSynchronizer.java:754) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire([email protected]/AbstractQueuedSynchronizer.java:990) at java.util.concurrent.locks.ReentrantLock$Sync.lock([email protected]/ReentrantLock.java:153) at java.util.concurrent.locks.ReentrantLock.lock([email protected]/ReentrantLock.java:322) at org.apache.kafka.server.purgatory.DelayedOperation.safeTryComplete(DelayedOperation.java:134) at org.apache.kafka.server.purgatory.DelayedOperationPurgatory$Watchers.tryCompleteWatched(DelayedOperationPurgatory.java:336) at org.apache.kafka.server.purgatory.DelayedOperationPurgatory.checkAndComplete(DelayedOperationPurgatory.java:187) at kafka.server.ReplicaManager.completeDelayedShareFetchRequest(ReplicaManager.scala:496) at kafka.server.share.SharePartitionManager.lambda$processShareFetch$12(SharePartitionManager.java:590) at kafka.server.share.SharePartitionManager$$Lambda/0x00007fc60c6890f0.accept(Unknown Source) at java.util.concurrent.CompletableFuture.uniWhenComplete([email protected]/CompletableFuture.java:863) at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire([email protected]/CompletableFuture.java:841) at java.util.concurrent.CompletableFuture.postComplete([email protected]/CompletableFuture.java:510) at java.util.concurrent.CompletableFuture.complete([email protected]/CompletableFuture.java:2179) at kafka.server.share.SharePartition.lambda$maybeInitialize$0(SharePartition.java:482) ``` Another thread holds the lock on the delayed fetch and is waiting for the write lock of the shared partition. ``` "data-plane-kafka-request-handler-6": at jdk.internal.misc.Unsafe.park([email protected]/Native Method) - parking to wait for <0x0000000081ce0058> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:221) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire([email protected]/AbstractQueuedSynchronizer.java:754) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared([email protected]/AbstractQueuedSynchronizer.java:1079) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock([email protected]/ReentrantReadWriteLock.java:738) at kafka.server.share.SharePartition.partitionState(SharePartition.java:1642) at kafka.server.share.SharePartition.stateNotActive(SharePartition.java:1142) at kafka.server.share.SharePartition.maybeAcquireFetchLock(SharePartition.java:1107) at kafka.server.share.DelayedShareFetch.lambda$acquirablePartitions$2(DelayedShareFetch.java:212) at kafka.server.share.DelayedShareFetch$$Lambda/0x00007fc60c689320.accept(Unknown Source) at java.util.LinkedHashMap.forEach([email protected]/LinkedHashMap.java:986) at kafka.server.share.DelayedShareFetch.acquirablePartitions(DelayedShareFetch.java:208) at kafka.server.share.DelayedShareFetch.tryComplete(DelayedShareFetch.java:160) at org.apache.kafka.server.purgatory.DelayedOperation.safeTryComplete(DelayedOperation.java:137) at org.apache.kafka.server.purgatory.DelayedOperationPurgatory$Watchers.tryCompleteWatched(DelayedOperationPurgatory.java:336) at org.apache.kafka.server.purgatory.DelayedOperationPurgatory.checkAndComplete(DelayedOperationPurgatory.java:187) at kafka.server.ReplicaManager.$anonfun$addCompletePurgatoryAction$2(ReplicaManager.scala:988) at kafka.server.ReplicaManager$$Lambda/0x00007fc60c6620c8.apply(Unknown Source) at scala.collection.mutable.HashMap$Node.foreach(HashMap.scala:642) at scala.collection.mutable.HashMap.foreach(HashMap.scala:504) at kafka.server.ReplicaManager.$anonfun$addCompletePurgatoryAction$1(ReplicaManager.scala:979) at kafka.server.ReplicaManager$$Lambda/0x00007fc60c65e210.run(Unknown Source) at org.apache.kafka.server.DelayedActionQueue.tryCompleteActions(DelayedActionQueue.java:45) at kafka.server.ReplicaManager.tryCompleteActions(ReplicaManager.scala:774) at kafka.server.KafkaApis.handle(KafkaApis.scala:289) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:158) at java.lang.Thread.runWith([email protected]/Thread.java:1596) at java.lang.Thread.run([email protected]/Thread.java:1583) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
