[
https://issues.apache.org/jira/browse/FLINK-24756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Frederic Hemery updated FLINK-24756:
------------------------------------
Description:
Metric identifiers are built by concatenating the closest
{{ComponentMetricGroup}} metric identifier (which is configurable) and the
whole hierarchy of groups that have been added.
In a monitoring system like Datadog, it poses a challenge because it is tricky
to aggregate across metric identifiers. Instead, it relies on the same metric
identifier and different tags to distinguish between different timeseries.
Using Flink Datadog integration, we get:
||Metric Name||Tags||
|flink.operator.KafkaSourceReader.topic.resources.partition.0.committedOffset|[topic:resources,partition:0]|
|flink.operator.KafkaSourceReader.topic.resources.partition.1.committedOffset|[topic:resources,partition:1]|
|flink.operator.KafkaSourceReader.topic.resources.partition.2.committedOffset|[topic:resources,partition:2]|
|flink.operator.KafkaSourceReader.topic.resources.partition.3.committedOffset|[topic:resources,partition:3]|
|flink.operator.KafkaSourceReader.topic.resources.partition.4.committedOffset|[topic:resources,partition:4]|
|flink.operator.KafkaSourceReader.topic.resources.partition.5.committedOffset|[topic:resources,partition:5]|
|flink.operator.KafkaSourceReader.topic.resources.partition.6.committedOffset|[topic:resources,partition:6]|
|...|...|
Instead, the native way to represent metrics in Datadog would be:
||Metric Name||Tags||
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:0]|
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:1]|
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:2]|
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:3]|
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:4]|
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:5]|
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:6]|
|...|...|
The recommended way to configure the scopes for the {{ComponentMetricGroup}} in
[Datadog Docs|https://docs.datadoghq.com/integrations/flink/#metric-collection]
is to remove all the scopes from the templates for the same reason.
The metric identifier is built from the scopes and the tags are built from the
variables. The issue seems to come from groups being part of both the scopes
and the user variables. We can override this behavior by creating a custom
metric group for user reported metrics but this is impossible to override for
metrics reported by Flink itself (in particular [native
RocksDB|https://github.com/apache/flink/blob/664fdaeaccf910c587f3478dd80bb327b441e85a/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBNativeMetricMonitor.java#L78-L80]
metrics and
[Kafka|https://github.com/apache/flink/blob/99c2a415e9eeefafacf70762b6f54070f7911ceb/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/internals/AbstractFetcher.java#L501-L506]
metrics).
I couldn't think of a simple, clean and backward compatible way to achieve such
a change though so I'm looking for feedback on how to proceed.
was:
Metric identifiers are built by concatenating the closest
{{ComponentMetricGroup}} metric identifier (which is configurable) and the
whole hierarchy of groups that have been added.
In a monitoring system like Datadog, it poses a challenge because it is tricky
to aggregate across metric identifiers. Instead, it relies on the same metric
identifier and different tags to distinguish between different timeseries.
Using Flink Datadog integration, we get:
||Metric Name||Tags||
|flink.operator.KafkaSourceReader.topic.resources.partition.0.committedOffset|[topic:resources,partition:0]|
|flink.operator.KafkaSourceReader.topic.resources.partition.1.committedOffset|[topic:resources,partition:1]|
|flink.operator.KafkaSourceReader.topic.resources.partition.2.committedOffset|[topic:resources,partition:2]|
|flink.operator.KafkaSourceReader.topic.resources.partition.3.committedOffset|[topic:resources,partition:3]|
|flink.operator.KafkaSourceReader.topic.resources.partition.4.committedOffset|[topic:resources,partition:4]|
|flink.operator.KafkaSourceReader.topic.resources.partition.5.committedOffset|[topic:resources,partition:5]|
|flink.operator.KafkaSourceReader.topic.resources.partition.6.committedOffset|[topic:resources,partition:6]|
|...|...|
Instead, the native way to represent metrics in Datadog would be:
||Metric Name||Tags||
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:0]|
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:1]|
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:2]|
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:3]|
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:4]|
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:5]|
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:6]|
|...|...|
The recommended way to configure the scopes for the {{ComponentMetricGroup}} in
[Datadog Docs|https://docs.datadoghq.com/integrations/flink/#metric-collection]
is to remove all the scopes from the templates for the same reason.
The metric identifier is built from the scopes and the tags are built from the
variables. The issue seems to come from groups being part of both the scopes
and the user variables. We can override this behavior by creating a custom
metric group for user reported metrics but this is impossible to override for
metrics reported by Flink itself (in particular [native
RocksDB|https://github.com/apache/flink/blob/664fdaeaccf910c587f3478dd80bb327b441e85a/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBNativeMetricMonitor.java#L78-L80]
metrics and
[Kafka|https://github.com/apache/flink/blob/99c2a415e9eeefafacf70762b6f54070f7911ceb/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/internals/AbstractFetcher.java#L501-L506]
metrics).
I couldn't think of a simple, clean and backward compatible way to achieve such
a change though so I'm looking for feedback.
> Flink metric identifiers contain group variables.
> -------------------------------------------------
>
> Key: FLINK-24756
> URL: https://issues.apache.org/jira/browse/FLINK-24756
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Metrics
> Reporter: Frederic Hemery
> Priority: Major
>
> Metric identifiers are built by concatenating the closest
> {{ComponentMetricGroup}} metric identifier (which is configurable) and the
> whole hierarchy of groups that have been added.
> In a monitoring system like Datadog, it poses a challenge because it is
> tricky to aggregate across metric identifiers. Instead, it relies on the same
> metric identifier and different tags to distinguish between different
> timeseries.
>
> Using Flink Datadog integration, we get:
> ||Metric Name||Tags||
> |flink.operator.KafkaSourceReader.topic.resources.partition.0.committedOffset|[topic:resources,partition:0]|
> |flink.operator.KafkaSourceReader.topic.resources.partition.1.committedOffset|[topic:resources,partition:1]|
> |flink.operator.KafkaSourceReader.topic.resources.partition.2.committedOffset|[topic:resources,partition:2]|
> |flink.operator.KafkaSourceReader.topic.resources.partition.3.committedOffset|[topic:resources,partition:3]|
> |flink.operator.KafkaSourceReader.topic.resources.partition.4.committedOffset|[topic:resources,partition:4]|
> |flink.operator.KafkaSourceReader.topic.resources.partition.5.committedOffset|[topic:resources,partition:5]|
> |flink.operator.KafkaSourceReader.topic.resources.partition.6.committedOffset|[topic:resources,partition:6]|
> |...|...|
> Instead, the native way to represent metrics in Datadog would be:
> ||Metric Name||Tags||
> |flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:0]|
> |flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:1]|
> |flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:2]|
> |flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:3]|
> |flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:4]|
> |flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:5]|
> |flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:6]|
> |...|...|
> The recommended way to configure the scopes for the {{ComponentMetricGroup}}
> in [Datadog
> Docs|https://docs.datadoghq.com/integrations/flink/#metric-collection] is to
> remove all the scopes from the templates for the same reason.
>
> The metric identifier is built from the scopes and the tags are built from
> the variables. The issue seems to come from groups being part of both the
> scopes and the user variables. We can override this behavior by creating a
> custom metric group for user reported metrics but this is impossible to
> override for metrics reported by Flink itself (in particular [native
> RocksDB|https://github.com/apache/flink/blob/664fdaeaccf910c587f3478dd80bb327b441e85a/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBNativeMetricMonitor.java#L78-L80]
> metrics and
> [Kafka|https://github.com/apache/flink/blob/99c2a415e9eeefafacf70762b6f54070f7911ceb/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/internals/AbstractFetcher.java#L501-L506]
> metrics).
>
> I couldn't think of a simple, clean and backward compatible way to achieve
> such a change though so I'm looking for feedback on how to proceed.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)