[
https://issues.apache.org/jira/browse/KAFKA-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Guozhang Wang resolved KAFKA-7136.
----------------------------------
Resolution: Fixed
Fix Version/s: 1.1.1
> PushHttpMetricsReporter may deadlock when processing metrics changes
> --------------------------------------------------------------------
>
> Key: KAFKA-7136
> URL: https://issues.apache.org/jira/browse/KAFKA-7136
> Project: Kafka
> Issue Type: Bug
> Components: metrics
> Affects Versions: 1.1.0, 2.0.0
> Reporter: Rajini Sivaram
> Assignee: Rajini Sivaram
> Priority: Blocker
> Fix For: 2.0.0, 1.1.1
>
>
> We noticed a deadlock in {{PushHttpMetricsReporter}}. Locking for metrics was
> changed under KAFKA-6765 to avoid {{NullPointerException}} in metrics
> reporters due to concurrent read and updates. {{PushHttpMetricsReporter}}
> requires a lock to process metrics registration that is invoked while holding
> the sensor lock. It also reads metrics attempting to acquire sensor lock
> while holding its lock (inverse order). This resulted in the deadlock below.
> {quote}Found one Java-level deadlock:
> Java stack information for the threads listed above:
> ===================================================
> "StreamThread-7":
> at
> org.apache.kafka.tools.PushHttpMetricsReporter.metricChange(PushHttpMetricsReporter.java:144)
> - waiting to lock <0x0000000655a54310> (a java.lang.Object)
> at org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:563)
> - locked <0x0000000655a44a28> (a org.apache.kafka.common.metrics.Metrics)
> at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:236)
> - locked <0x000000065629c170> (a org.apache.kafka.common.metrics.Sensor)
> at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:217)
> at
> org.apache.kafka.common.network.Selector$SelectorMetrics.maybeRegisterConnectionMetrics(Selector.java:1016)
> at
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:462)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:425)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:510)
> at
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:271)
> at
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:242)
> at
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:218)
> at
> org.apache.kafka.clients.consumer.internals.Fetcher.getTopicMetadata(Fetcher.java:274)
> at
> org.apache.kafka.clients.consumer.internals.Fetcher.getAllTopicMetadata(Fetcher.java:254)
> at
> org.apache.kafka.clients.consumer.KafkaConsumer.listTopics(KafkaConsumer.java:1820)
> at
> org.apache.kafka.clients.consumer.KafkaConsumer.listTopics(KafkaConsumer.java:1798)
> at
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.refreshChangelogInfo(StoreChangelogReader.java:224)
> at
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.initialize(StoreChangelogReader.java:121)
> at
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:74)
> at
> org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:317)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:824)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736)
> "pool-17-thread-1":
> at
> org.apache.kafka.common.metrics.KafkaMetric.measurableValue(KafkaMetric.java:82)
> - waiting to lock <0x000000065629c170> (a
> org.apache.kafka.common.metrics.Sensor)
> at org.apache.kafka.common.metrics.KafkaMetric.value(KafkaMetric.java:58)
> at
> org.apache.kafka.tools.PushHttpMetricsReporter$HttpReporter.run(PushHttpMetricsReporter.java:177)
> - locked <0x0000000655a54310> (a java.lang.Object)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Found 1 deadlock.
> {quote}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)