[
https://issues.apache.org/jira/browse/KAFKA-17766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17888306#comment-17888306
]
David Arthur commented on KAFKA-17766:
--------------------------------------
cc [~satish.duggana] [~ckamal]
> TopicBasedRemoteLogMetadataManager stuck in close
> -------------------------------------------------
>
> Key: KAFKA-17766
> URL: https://issues.apache.org/jira/browse/KAFKA-17766
> Project: Kafka
> Issue Type: Bug
> Components: Tiered-Storage
> Reporter: David Arthur
> Priority: Major
> Attachments: GradleWorkerMain-7952.txt
>
>
> During a CI run, there was a timed out build due to this class stuck in its
> close method.
>
> {code:java}
> "Test worker" #1 prio=5 os_prio=0 cpu=9155.23ms elapsed=9615.57s
> tid=0x00007fcc80029800 nid=0x1f12 in Object.wait() [0x00007fcc853f9000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait([email protected]/Native Method)
> - waiting on <no object reference available>
> at java.lang.Thread.join([email protected]/Thread.java:1300)
> - waiting to re-lock in wait() <0x000000008189e9f8> (a
> org.apache.kafka.common.utils.KafkaThread)
> at java.lang.Thread.join([email protected]/Thread.java:1375)
> at
> org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.close(TopicBasedRemoteLogMetadataManager.java:575)
> {code}
>
> {code:java}
> "RLMMInitializationThread" #9511 prio=5 os_prio=0 cpu=1.40ms elapsed=9222.98s
> tid=0x00007fcc8196f800 nid=0x12ef2 waiting on condition [0x00007fcbe05fe000]
> java.lang.Thread.State: WAITING (parking)
> at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
> - parking to wait for <0x0000000081e364c0> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at
> java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt([email protected]/AbstractQueuedSynchronizer.java:885)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued([email protected]/AbstractQueuedSynchronizer.java:917)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire([email protected]/AbstractQueuedSynchronizer.java:1240)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock([email protected]/ReentrantReadWriteLock.java:959)
> at
> org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.initializeResources(TopicBasedRemoteLogMetadataManager.java:432)
> at
> org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager$$Lambda$2007/0x0000000100934c40.run(Unknown
> Source)
> at java.lang.Thread.run([email protected]/Thread.java:829) {code}
>
> It seems we are joining the initialization thread assuming that it has (or
> will) complete. This appears to be a lock race between the close method and
> the initialization thread which results in a dead lock.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)