Rajini Sivaram created KAFKA-7136:
-------------------------------------
Summary: PushHttpMetricsReporter may deadlock when processing
metrics changes
Key: KAFKA-7136
URL: https://issues.apache.org/jira/browse/KAFKA-7136
Project: Kafka
Issue Type: Bug
Components: metrics
Affects Versions: 1.1.0, 2.0.0
Reporter: Rajini Sivaram
Assignee: Rajini Sivaram
Fix For: 2.0.0
We noticed a deadlock in {{PushHttpMetricsReporter}}. Locking for metrics was
changed under KAFKA-6765 to avoid {{NullPointerException}} in metrics reporters
due to concurrent read and updates. {{PushHttpMetricsReporter}} requires a lock
to process metrics registration that is invoked while holding the sensor lock.
It also reads metrics attempting to acquire sensor lock while holding its lock
(inverse order). This resulted in the deadlock below.
{quote}
Found one Java-level deadlock:
Java stack information for the threads listed above:
===================================================
"StreamThread-7":
at
org.apache.kafka.tools.PushHttpMetricsReporter.metricChange(PushHttpMetricsReporter.java:144)
- waiting to lock <0x0000000655a54310> (a java.lang.Object)
at
org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:563)
- locked <0x0000000655a44a28> (a
org.apache.kafka.common.metrics.Metrics)
at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:236)
- locked <0x000000065629c170> (a org.apache.kafka.common.metrics.Sensor)
at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:217)
at
org.apache.kafka.common.network.Selector$SelectorMetrics.maybeRegisterConnectionMetrics(Selector.java:1016)
at
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:462)
at org.apache.kafka.common.network.Selector.poll(Selector.java:425)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:510)
at
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:271)
at
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:242)
at
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:218)
at
org.apache.kafka.clients.consumer.internals.Fetcher.getTopicMetadata(Fetcher.java:274)
at
org.apache.kafka.clients.consumer.internals.Fetcher.getAllTopicMetadata(Fetcher.java:254)
at
org.apache.kafka.clients.consumer.KafkaConsumer.listTopics(KafkaConsumer.java:1820)
at
org.apache.kafka.clients.consumer.KafkaConsumer.listTopics(KafkaConsumer.java:1798)
at
org.apache.kafka.streams.processor.internals.StoreChangelogReader.refreshChangelogInfo(StoreChangelogReader.java:224)
at
org.apache.kafka.streams.processor.internals.StoreChangelogReader.initialize(StoreChangelogReader.java:121)
at
org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:74)
at
org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:317)
at
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:824)
at
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767)
at
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736)
"pool-17-thread-1":
at
org.apache.kafka.common.metrics.KafkaMetric.measurableValue(KafkaMetric.java:82)
- waiting to lock <0x000000065629c170> (a
org.apache.kafka.common.metrics.Sensor)
at
org.apache.kafka.common.metrics.KafkaMetric.value(KafkaMetric.java:58)
at
org.apache.kafka.tools.PushHttpMetricsReporter$HttpReporter.run(PushHttpMetricsReporter.java:177)
- locked <0x0000000655a54310> (a java.lang.Object)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Found 1 deadlock.
{quote}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)