[
https://issues.apache.org/jira/browse/KAFKA-16051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800814#comment-17800814
]
Octavian Ciubotaru commented on KAFKA-16051:
--------------------------------------------
Hi [~gharris1727] , Thank you for you insights, I for sure still have a lot to
learn about the project.
In meantime I created the PR. The fix is to always lock on herder first and on
the config backing store second. Tested in my environment and Kafka Connect
starts successfully with the fix applied.
> Deadlock on connector initialization
> ------------------------------------
>
> Key: KAFKA-16051
> URL: https://issues.apache.org/jira/browse/KAFKA-16051
> Project: Kafka
> Issue Type: Bug
> Components: KafkaConnect
> Affects Versions: 2.6.3, 3.6.1
> Reporter: Octavian Ciubotaru
> Priority: Major
>
>
> Tested with Kafka 3.6.1 and 2.6.3.
> The only plugin installed is confluentinc-kafka-connect-jdbc-10.7.4.
> Stack trace for Kafka 3.6.1:
> {noformat}
> Found one Java-level deadlock:
> =============================
> "pool-3-thread-1":
> waiting to lock monitor 0x00007fbc88006300 (object 0x0000000091002aa0, a
> org.apache.kafka.connect.runtime.standalone.StandaloneHerder),
> which is held by "Thread-9"
> "Thread-9":
> waiting to lock monitor 0x00007fbc88008800 (object 0x000000009101ccd8, a
> org.apache.kafka.connect.storage.MemoryConfigBackingStore),
> which is held by "pool-3-thread-1"Java stack information for the threads
> listed above:
> ===================================================
> "pool-3-thread-1":
> at
> org.apache.kafka.connect.runtime.standalone.StandaloneHerder$ConfigUpdateListener.onTaskConfigUpdate(StandaloneHerder.java:516)
> - waiting to lock <0x0000000091002aa0> (a
> org.apache.kafka.connect.runtime.standalone.StandaloneHerder)
> at
> org.apache.kafka.connect.storage.MemoryConfigBackingStore.putTaskConfigs(MemoryConfigBackingStore.java:137)
> - locked <0x000000009101ccd8> (a
> org.apache.kafka.connect.storage.MemoryConfigBackingStore)
> at
> org.apache.kafka.connect.runtime.standalone.StandaloneHerder.updateConnectorTasks(StandaloneHerder.java:483)
> at
> org.apache.kafka.connect.runtime.standalone.StandaloneHerder.lambda$null$2(StandaloneHerder.java:229)
> at
> org.apache.kafka.connect.runtime.standalone.StandaloneHerder$$Lambda$692/0x0000000840557440.run(Unknown
> Source)
> at
> java.util.concurrent.Executors$RunnableAdapter.call([email protected]/Executors.java:515)
> at
> java.util.concurrent.FutureTask.run([email protected]/FutureTask.java:264)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run([email protected]/ScheduledThreadPoolExecutor.java:304)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1128)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:628)
> at java.lang.Thread.run([email protected]/Thread.java:829)
> "Thread-9":
> at
> org.apache.kafka.connect.storage.MemoryConfigBackingStore.putTaskConfigs(MemoryConfigBackingStore.java:129)
> - waiting to lock <0x000000009101ccd8> (a
> org.apache.kafka.connect.storage.MemoryConfigBackingStore)
> at
> org.apache.kafka.connect.runtime.standalone.StandaloneHerder.updateConnectorTasks(StandaloneHerder.java:483)
> at
> org.apache.kafka.connect.runtime.standalone.StandaloneHerder.requestTaskReconfiguration(StandaloneHerder.java:255)
> - locked <0x0000000091002aa0> (a
> org.apache.kafka.connect.runtime.standalone.StandaloneHerder)
> at
> org.apache.kafka.connect.runtime.HerderConnectorContext.requestTaskReconfiguration(HerderConnectorContext.java:50)
> at
> org.apache.kafka.connect.runtime.WorkerConnector$WorkerConnectorContext.requestTaskReconfiguration(WorkerConnector.java:548)
> at
> io.confluent.connect.jdbc.source.TableMonitorThread.run(TableMonitorThread.java:86)
> Found 1 deadlock.
> {noformat}
> The jdbc source connector is loading tables from the database and updates the
> configuration once the list is available. The deadlock is very consistent in
> my environment, probably because the database is on the same machine.
> Maybe it is possible to avoid this situation by always locking the herder
> first and the config backing store second. From what I see,
> updateConnectorTasks sometimes is called before locking on herder and other
> times it is not.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)