[ 
https://issues.apache.org/jira/browse/FLINK-38704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18063514#comment-18063514
 ] 

Mukul Gupta edited comment on FLINK-38704 at 3/6/26 11:44 AM:
--------------------------------------------------------------

This issue is present in Flink 1.19+. Earlier versions were not tested. 

The issue occurs because numeric YAML configuration values are stored as 
Integer/Long objects, but {{Properties.getProperty()}} only returns Strings. 
When {{MetricConfig.getString()}} is called for numeric values like 
{{{}metrics.reporter.prom.port{}}}, it returns null, causing reporters to fall 
back to default values.

*Note:* The bug only reproduces when a single number is assigned to the port 
(e.g., {{{}9999{}}}). It works correctly when a port-range is used (e.g., 
{{{}9000-9100{}}}) because ranges are stored as Strings.

*Safety:* This change is safe and non-breaking. Properties are inherently 
String-based (all values are stored and retrieved as Strings), so converting 
values to String at insertion time maintains the expected behavior while fixing 
the type mismatch issue.

{{Either code changes needs to be done as per pull request or documentation 
needs to be corrected}}

*Workaround:* Quote the port value in YAML  configuration: 
{{metrics.reporter.prom.port: "9999"}}


was (Author: JIRAUSER312410):
This issue is present in Flink 1.19+. Earlier versions were not tested. 

The issue occurs because numeric YAML configuration values are stored as 
Integer/Long objects, but {{Properties.getProperty()}} only returns Strings. 
When {{MetricConfig.getString()}} is called for numeric values like 
{{{}metrics.reporter.prom.port{}}}, it returns null, causing reporters to fall 
back to default values.

*Note:* The bug only reproduces when a single number is assigned to the port 
(e.g., {{{}9999{}}}). It works correctly when a port-range is used (e.g., 
{{{}9000-9100{}}}) because ranges are stored as Strings.

*Safety:* This change is safe and non-breaking. Properties are inherently 
String-based (all values are stored and retrieved as Strings), so converting 
values to String at insertion time maintains the expected behavior while fixing 
the type mismatch issue.

{{Either code changes needs to be done as per pull request or documentation 
needs to be corrected}}

*Workaround:* Quote the port value in YAML  configuration: 
{{metrics.reporter.prom.port: "9999"}}
h5. {{}}

> Metrics reporter setup does not load Prometheus with correct configs/port
> -------------------------------------------------------------------------
>
>                 Key: FLINK-38704
>                 URL: https://issues.apache.org/jira/browse/FLINK-38704
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Metrics
>    Affects Versions: 2.0.1, 2.2.0, 2.1.1
>            Reporter: Mohsen Rezaei
>            Priority: Major
>              Labels: pull-request-available
>
> Something that was working in 1.x releases, but it doesn't load the correct 
> config in 2.x.
> Runtime Flink configurations loaded:
> {code:java}
> 2025-11-20 04:33:51.737 [main] INFO  
> org.apache.flink.configuration.GlobalConfiguration  - Loading configuration 
> property: metrics.reporter.prom.port, 9999
> 2025-11-20 04:33:51.738 [main] INFO  
> org.apache.flink.configuration.GlobalConfiguration  - Loading configuration 
> property: metrics.reporter.prom.factory.class, 
> org.apache.flink.metrics.prometheus.PrometheusReporterFactory
> {code}
> But the reporter setup [loads the default 
> port|https://github.com/apache/flink/blob/45ab6c816465e717d0eef2ad6672cbb0c1a73a7e/flink-metrics/flink-metrics-prometheus/src/main/java/org/apache/flink/metrics/prometheus/PrometheusReporterFactory.java#L33]
> {code:java}
> 2025-11-20 04:33:55.520 [main] INFO  
> org.apache.flink.metrics.prometheus.PrometheusReporter  - Started 
> PrometheusReporter HTTP server on port 9249.
> {code}
> and only vending metrics from 9249:
> {code:java}
> flink@jm-0:~$ curl localhost:9999/metrics
> curl: (7) Failed to connect to localhost port 9999 after 0 ms: Couldn't 
> connect to server
> flink@jm-0:~$ curl localhost:9249/metrics
> # HELP flink_jobmanager_Status_JVM_GarbageCollector_Copy_TimeMsPerSecond 
> TimeMsPerSecond (scope: jobmanager_Status_JVM_GarbageCollector_Copy)
> # TYPE flink_jobmanager_Status_JVM_GarbageCollector_Copy_TimeMsPerSecond gauge
> flink_jobmanager_Status_JVM_GarbageCollector_Copy_TimeMsPerSecond{host="10_155_60_8",}
>  0.0
> ...
> {code}
> This is potentially affecting all the reporters loaded via their factory in 
> {{{}ReporterSetup{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to