David Arthur created KAFKA-17766:
------------------------------------

             Summary: TopicBasedRemoteLogMetadataManager stuck in close
                 Key: KAFKA-17766
                 URL: https://issues.apache.org/jira/browse/KAFKA-17766
             Project: Kafka
          Issue Type: Bug
            Reporter: David Arthur
         Attachments: GradleWorkerMain-7952.txt

During a CI run, there was a timed out build due to this class stuck in its 
close method.

 
{code:java}
"Test worker" #1 prio=5 os_prio=0 cpu=9155.23ms elapsed=9615.57s 
tid=0x00007fcc80029800 nid=0x1f12 in Object.wait()  [0x00007fcc853f9000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(java.base@11.0.24/Native Method)
    - waiting on <no object reference available>
    at java.lang.Thread.join(java.base@11.0.24/Thread.java:1300)
    - waiting to re-lock in wait() <0x000000008189e9f8> (a 
org.apache.kafka.common.utils.KafkaThread)
    at java.lang.Thread.join(java.base@11.0.24/Thread.java:1375)
    at 
org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.close(TopicBasedRemoteLogMetadataManager.java:575)
 {code}
 
{code:java}
"RLMMInitializationThread" #9511 prio=5 os_prio=0 cpu=1.40ms elapsed=9222.98s 
tid=0x00007fcc8196f800 nid=0x12ef2 waiting on condition  [0x00007fcbe05fe000]
   java.lang.Thread.State: WAITING (parking)
    at jdk.internal.misc.Unsafe.park(java.base@11.0.24/Native Method)
    - parking to wait for  <0x0000000081e364c0> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
    at 
java.util.concurrent.locks.LockSupport.park(java.base@11.0.24/LockSupport.java:194)
    at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base@11.0.24/AbstractQueuedSynchronizer.java:885)
    at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.base@11.0.24/AbstractQueuedSynchronizer.java:917)
    at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@11.0.24/AbstractQueuedSynchronizer.java:1240)
    at 
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(java.base@11.0.24/ReentrantReadWriteLock.java:959)
    at 
org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.initializeResources(TopicBasedRemoteLogMetadataManager.java:432)
    at 
org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager$$Lambda$2007/0x0000000100934c40.run(Unknown
 Source)
    at java.lang.Thread.run(java.base@11.0.24/Thread.java:829) {code}
 

It seems we are joining the initialization thread assuming that it has (or 
will) complete. This appears to be a lock race between the close method and the 
initialization thread which results in a dead lock.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to