Hi all,
I would like to start a discussion about how to differentiate
monotonic and non monotonic counters in flink metrics.

Monotonic (ever-increasing) Counters can benefit from automatic reset
detection on the monitoring system side (when the value drops we can
safely assume the process was reset and synthetically adjust the
value)

Historically, flink Counter can be decremented / incremented by a
non-positive value, but this has almost never been used intentionally
across the flink codebase, i.e if a counter got decremented this is
usually a bug [1].
So though system-emitted counters are effectively monotonic, exporters
must respect org.apache.flink.metrics.Counter contract and assume
non-monotonicity.

I'd like to propose deprecating org.apache.flink.metrics.Counter#dec
in favor of a new UpDownCounter implementation. This matches modern
metric APIs like OTel, where the regular Counter is monotonic[2] and
an additional UpDownCounter supports[3] non-positive additions.

While it seems to be the cleanest approach, we could still avoid the
deprecation by introducing a MonotonicCounter and have all flink
counters migrated, or expand the Counter interface to declare
monotonicity (based on the implementation).

Recognising monotonicity will also align counters reporting across
monitoring systems. Today, for instance, Otel reporter drops[4]
non-incremental data points with a warning, while Prometheus reporter
casts[5] them as Gauges.

I'm looking forward to your feedback
Efrat

[1] https://issues.apache.org/jira/browse/FLINK-39892
[2] 
https://github.com/open-telemetry/opentelemetry-java/blob/main/api/all/src/main/java/io/opentelemetry/api/metrics/LongCounter.java#L40
[3] 
https://github.com/open-telemetry/opentelemetry-java/blob/main/api/all/src/main/java/io/opentelemetry/api/metrics/LongUpDownCounter.java
[4] https://issues.apache.org/jira/browse/FLINK-39893
[5] 
https://github.com/apache/flink/blob/master/flink-metrics/flink-metrics-prometheus/src/main/java/org/apache/flink/metrics/prometheus/AbstractPrometheusReporter.java#L177-L184

Reply via email to