David Arthur created KAFKA-17766: ------------------------------------ Summary: TopicBasedRemoteLogMetadataManager stuck in close Key: KAFKA-17766 URL: https://issues.apache.org/jira/browse/KAFKA-17766 Project: Kafka Issue Type: Bug Reporter: David Arthur Attachments: GradleWorkerMain-7952.txt
During a CI run, there was a timed out build due to this class stuck in its close method. {code:java} "Test worker" #1 prio=5 os_prio=0 cpu=9155.23ms elapsed=9615.57s tid=0x00007fcc80029800 nid=0x1f12 in Object.wait() [0x00007fcc853f9000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(java.base@11.0.24/Native Method) - waiting on <no object reference available> at java.lang.Thread.join(java.base@11.0.24/Thread.java:1300) - waiting to re-lock in wait() <0x000000008189e9f8> (a org.apache.kafka.common.utils.KafkaThread) at java.lang.Thread.join(java.base@11.0.24/Thread.java:1375) at org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.close(TopicBasedRemoteLogMetadataManager.java:575) {code} {code:java} "RLMMInitializationThread" #9511 prio=5 os_prio=0 cpu=1.40ms elapsed=9222.98s tid=0x00007fcc8196f800 nid=0x12ef2 waiting on condition [0x00007fcbe05fe000] java.lang.Thread.State: WAITING (parking) at jdk.internal.misc.Unsafe.park(java.base@11.0.24/Native Method) - parking to wait for <0x0000000081e364c0> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(java.base@11.0.24/LockSupport.java:194) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base@11.0.24/AbstractQueuedSynchronizer.java:885) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.base@11.0.24/AbstractQueuedSynchronizer.java:917) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@11.0.24/AbstractQueuedSynchronizer.java:1240) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(java.base@11.0.24/ReentrantReadWriteLock.java:959) at org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.initializeResources(TopicBasedRemoteLogMetadataManager.java:432) at org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager$$Lambda$2007/0x0000000100934c40.run(Unknown Source) at java.lang.Thread.run(java.base@11.0.24/Thread.java:829) {code} It seems we are joining the initialization thread assuming that it has (or will) complete. This appears to be a lock race between the close method and the initialization thread which results in a dead lock. -- This message was sent by Atlassian Jira (v8.20.10#820010)