[
https://issues.apache.org/jira/browse/FLINK-31372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Krzysztof Dziolak updated FLINK-31372:
--------------------------------------
Description:
We've identified a memory leak, that occurs when any of the metric reporters
fail with an exception. In such cases HTTPExchanges are not getting closed
properly in io.prometheus.client.exporter.HTTPServer.HTTPMetricHandler
In our case the failure was triggered by usage of incompatible Kafka Client
failing metric collection with:
{{Exception in thread "prometheus-http-1-72873" java.lang.NoSuchMethodError:
'double org.apache.kafka.common.Metric.value()'}}
Should Prometheus Reporter handle metric collection defensively (by suppressing
exceptions) to guarantee metric delivery and avoid similar memory leaks?
was:
Basically I'm running flink at the 1.15.1 version with docker and often the
application start to slow down because of OOM errors.
It was observed that the memory continued to increase, and the number of
threads continued to increase through the mertics data collected by Prometheus。
I tried to remove the sink kafka code and it looks normal,so I change the flink
to 1.14.5 and it works fine.
Is this a bug?
> Memory Leak in HTTPMetricHandler when reporting fails
> -----------------------------------------------------
>
> Key: FLINK-31372
> URL: https://issues.apache.org/jira/browse/FLINK-31372
> Project: Flink
> Issue Type: Bug
> Components: Connectors / Kafka, kafka
> Affects Versions: 1.16.1, 1.15.4, 1.17.1
> Reporter: Krzysztof Dziolak
> Priority: Minor
>
> We've identified a memory leak, that occurs when any of the metric reporters
> fail with an exception. In such cases HTTPExchanges are not getting closed
> properly in io.prometheus.client.exporter.HTTPServer.HTTPMetricHandler
> In our case the failure was triggered by usage of incompatible Kafka Client
> failing metric collection with:
> {{Exception in thread "prometheus-http-1-72873" java.lang.NoSuchMethodError:
> 'double org.apache.kafka.common.Metric.value()'}}
> Should Prometheus Reporter handle metric collection defensively (by
> suppressing exceptions) to guarantee metric delivery and avoid similar memory
> leaks?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)