Ivan San Jose created BEAM-10928:
------------------------------------

             Summary: FlinkDistributionGauge and FlinkGauge metrics are 
exported as zero to Prometheus when using any Flink's PrometheusReporter
                 Key: BEAM-10928
                 URL: https://issues.apache.org/jira/browse/BEAM-10928
             Project: Beam
          Issue Type: Bug
          Components: runner-flink
    Affects Versions: 2.23.0
            Reporter: Ivan San Jose


To be honest I'm really lost on this one because, let me explain the issue:

Beam has its own metrics types (org/apache/beam/sdk/metrics/Metrics.java) 
-counter, distribution, and gauge-, and, depending on the runner, wraps them 
into their corresponding runner types. For example, for Flink, Beam is wrapping 
its Gauge type into a class called FlinkGauge which extends a Gauge<Long>.

Also, Beam's Distribution metric its wrapped into a Flink's 
Gauge<DistributionResult>, where DistributionResult is a Beam type containing 
min,max,sum,count.

Then, if you are using Flink, and you want to export those metrics to 
Prometheus, on flink-metrics-prometheus, you will see that they are always 
zero, and, if you set DEBUG log level for  
"org.apache.flink.metrics.prometheus" package, you will see error like 
following ones:
{code}
2020-09-18 06:27:04,387 DEBUG Invalid type for Gauge 
org.apache.beam.runners.flink.metrics.FlinkMetricContainer$FlinkDistributionGauge@30211d3f:
 org.apache.beam.sdk.metrics.AutoValue_DistributionResult, only number types 
and booleans are supported by this reporter.
2020-09-18 06:27:04,394 DEBUG Invalid type for Gauge 
org.apache.beam.runners.flink.metrics.FlinkMetricContainer$FlinkGauge@2ad1562: 
org.apache.beam.sdk.metrics.AutoValue_GaugeResult, only number types and 
booleans are supported by this reporter.
{code}

Which is really weird, because if we check the source code of 
AbstractPrometheusReporter, we can see that is taking the value from Flink's 
Gauge using getValue():
https://github.com/apache/flink/blob/master/flink-metrics/flink-metrics-prometheus/src/main/java/org/apache/flink/metrics/prometheus/AbstractPrometheusReporter.java#L225

And FlinkGauge.getValue() should return a long instead of 
org.apache.beam.sdk.metrics.AutoValue_GaugeResult. So I don't understand what 
is happening there to be honest. May be AutoValue mechanism is messing things 
up?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to