[
https://issues.apache.org/jira/browse/FLINK-24756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17484771#comment-17484771
]
Simon Frei commented on FLINK-24756:
------------------------------------
I think this is a duplicate of https://issues.apache.org/jira/browse/FLINK-7935
- I responded there with some thoughts on how to address this.
> Flink metric identifiers contain group variables.
> -------------------------------------------------
>
> Key: FLINK-24756
> URL: https://issues.apache.org/jira/browse/FLINK-24756
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Metrics
> Reporter: Frederic Hemery
> Priority: Major
>
> Metric identifiers are built by concatenating the closest
> {{ComponentMetricGroup}} metric identifier (which is configurable) and the
> whole hierarchy of groups that have been added.
> In a monitoring system like Datadog, it poses a challenge because it is
> tricky to aggregate across metric identifiers. Instead, it relies on the same
> metric identifier and different tags to distinguish between different
> timeseries.
>
> Using Flink Datadog integration, we get:
> ||Metric Name||Tags||
> |flink.operator.KafkaSourceReader.topic.resources.partition.0.committedOffset|[topic:resources,partition:0]|
> |flink.operator.KafkaSourceReader.topic.resources.partition.1.committedOffset|[topic:resources,partition:1]|
> |flink.operator.KafkaSourceReader.topic.resources.partition.2.committedOffset|[topic:resources,partition:2]|
> |flink.operator.KafkaSourceReader.topic.resources.partition.3.committedOffset|[topic:resources,partition:3]|
> |flink.operator.KafkaSourceReader.topic.resources.partition.4.committedOffset|[topic:resources,partition:4]|
> |flink.operator.KafkaSourceReader.topic.resources.partition.5.committedOffset|[topic:resources,partition:5]|
> |flink.operator.KafkaSourceReader.topic.resources.partition.6.committedOffset|[topic:resources,partition:6]|
> |...|...|
> Instead, the native way to represent metrics in Datadog would be:
> ||Metric Name||Tags||
> |flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:0]|
> |flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:1]|
> |flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:2]|
> |flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:3]|
> |flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:4]|
> |flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:5]|
> |flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:6]|
> |...|...|
> The recommended way to configure the scopes for the {{ComponentMetricGroup}}
> in [Datadog
> Docs|https://docs.datadoghq.com/integrations/flink/#metric-collection] is to
> remove all the scopes from the templates for the same reason.
>
> The metric identifier is built from the scopes and the tags are built from
> the variables. The issue seems to come from groups being part of both the
> scopes and the user variables. We can override this behavior by creating a
> custom metric group for user reported metrics but this is impossible to
> override for metrics reported by Flink itself (in particular [native
> RocksDB|https://github.com/apache/flink/blob/664fdaeaccf910c587f3478dd80bb327b441e85a/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBNativeMetricMonitor.java#L78-L80]
> metrics and
> [Kafka|https://github.com/apache/flink/blob/99c2a415e9eeefafacf70762b6f54070f7911ceb/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/internals/AbstractFetcher.java#L501-L506]
> metrics).
>
> I couldn't think of a simple, clean and backward compatible way to achieve
> such a change though so I'm looking for feedback on how to proceed.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)