[jira] [Updated] (FLINK-24756) Flink metric identifiers contain group variables.

Frederic Hemery (Jira) Wed, 03 Nov 2021 11:32:25 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-24756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Frederic Hemery updated FLINK-24756:
------------------------------------
    Description: 
Metric identifiers are built by concatenating the closest 
{{ComponentMetricGroup}} metric identifier (which is configurable) and the 
whole hierarchy of groups that have been added.

In a monitoring system like Datadog, it poses a challenge because it is tricky 
to aggregate across metric identifiers. Instead, it relies on the same metric 
identifier and different tags to distinguish between different timeseries.

 

Using Flink Datadog integration, we get:
||Metric Name||Tags||
|flink.operator.KafkaSourceReader.topic.resources.partition.0.committedOffset|[topic:resources,partition:0]|
|flink.operator.KafkaSourceReader.topic.resources.partition.1.committedOffset|[topic:resources,partition:1]|
|flink.operator.KafkaSourceReader.topic.resources.partition.2.committedOffset|[topic:resources,partition:2]|
|flink.operator.KafkaSourceReader.topic.resources.partition.3.committedOffset|[topic:resources,partition:3]|
|flink.operator.KafkaSourceReader.topic.resources.partition.4.committedOffset|[topic:resources,partition:4]|
|flink.operator.KafkaSourceReader.topic.resources.partition.5.committedOffset|[topic:resources,partition:5]|
|flink.operator.KafkaSourceReader.topic.resources.partition.6.committedOffset|[topic:resources,partition:6]|
|...|...|

Instead, the native way to represent metrics in Datadog would be:
||Metric Name||Tags||
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:0]|
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:1]|
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:2]|
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:3]|
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:4]|
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:5]|
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:6]|
|...|...|

The recommended way to configure the scopes for the {{ComponentMetricGroup}} in 
[Datadog Docs|https://docs.datadoghq.com/integrations/flink/#metric-collection] 
is to remove all the scopes from the templates for the same reason.

 

The metric identifier is built from the scopes and the tags are built from the 
variables. The issue seems to come from groups being part of both the scopes 
and the user variables. We can override this behavior by creating a custom 
metric group for user reported metrics but this is impossible to override for 
metrics reported by Flink itself (in particular [native 
RocksDB|https://github.com/apache/flink/blob/664fdaeaccf910c587f3478dd80bb327b441e85a/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBNativeMetricMonitor.java#L78-L80]
 metrics and 
[Kafka|https://github.com/apache/flink/blob/99c2a415e9eeefafacf70762b6f54070f7911ceb/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/internals/AbstractFetcher.java#L501-L506]
 metrics).

 

I couldn't think of a simple, clean and backward compatible way to achieve such 
a change though so I'm looking for feedback on how to proceed.

  was:
Metric identifiers are built by concatenating the closest 
{{ComponentMetricGroup}} metric identifier (which is configurable) and the 
whole hierarchy of groups that have been added.

In a monitoring system like Datadog, it poses a challenge because it is tricky 
to aggregate across metric identifiers. Instead, it relies on the same metric 
identifier and different tags to distinguish between different timeseries.

 

Using Flink Datadog integration, we get:
||Metric Name||Tags||
|flink.operator.KafkaSourceReader.topic.resources.partition.0.committedOffset|[topic:resources,partition:0]|
|flink.operator.KafkaSourceReader.topic.resources.partition.1.committedOffset|[topic:resources,partition:1]|
|flink.operator.KafkaSourceReader.topic.resources.partition.2.committedOffset|[topic:resources,partition:2]|
|flink.operator.KafkaSourceReader.topic.resources.partition.3.committedOffset|[topic:resources,partition:3]|
|flink.operator.KafkaSourceReader.topic.resources.partition.4.committedOffset|[topic:resources,partition:4]|
|flink.operator.KafkaSourceReader.topic.resources.partition.5.committedOffset|[topic:resources,partition:5]|
|flink.operator.KafkaSourceReader.topic.resources.partition.6.committedOffset|[topic:resources,partition:6]|
|...|...|

Instead, the native way to represent metrics in Datadog would be:
||Metric Name||Tags||
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:0]|
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:1]|
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:2]|
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:3]|
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:4]|
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:5]|
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:6]|
|...|...|

The recommended way to configure the scopes for the {{ComponentMetricGroup}} in 
[Datadog Docs|https://docs.datadoghq.com/integrations/flink/#metric-collection] 
is to remove all the scopes from the templates for the same reason.

 

The metric identifier is built from the scopes and the tags are built from the 
variables. The issue seems to come from groups being part of both the scopes 
and the user variables. We can override this behavior by creating a custom 
metric group for user reported metrics but this is impossible to override for 
metrics reported by Flink itself (in particular [native 
RocksDB|https://github.com/apache/flink/blob/664fdaeaccf910c587f3478dd80bb327b441e85a/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBNativeMetricMonitor.java#L78-L80]
 metrics and 
[Kafka|https://github.com/apache/flink/blob/99c2a415e9eeefafacf70762b6f54070f7911ceb/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/internals/AbstractFetcher.java#L501-L506]
 metrics).

 

I couldn't think of a simple, clean and backward compatible way to achieve such 
a change though so I'm looking for feedback.


> Flink metric identifiers contain group variables.
> -------------------------------------------------
>
>                 Key: FLINK-24756
>                 URL: https://issues.apache.org/jira/browse/FLINK-24756
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Metrics
>            Reporter: Frederic Hemery
>            Priority: Major
>
> Metric identifiers are built by concatenating the closest 
> {{ComponentMetricGroup}} metric identifier (which is configurable) and the 
> whole hierarchy of groups that have been added.
> In a monitoring system like Datadog, it poses a challenge because it is 
> tricky to aggregate across metric identifiers. Instead, it relies on the same 
> metric identifier and different tags to distinguish between different 
> timeseries.
>  
> Using Flink Datadog integration, we get:
> ||Metric Name||Tags||
> |flink.operator.KafkaSourceReader.topic.resources.partition.0.committedOffset|[topic:resources,partition:0]|
> |flink.operator.KafkaSourceReader.topic.resources.partition.1.committedOffset|[topic:resources,partition:1]|
> |flink.operator.KafkaSourceReader.topic.resources.partition.2.committedOffset|[topic:resources,partition:2]|
> |flink.operator.KafkaSourceReader.topic.resources.partition.3.committedOffset|[topic:resources,partition:3]|
> |flink.operator.KafkaSourceReader.topic.resources.partition.4.committedOffset|[topic:resources,partition:4]|
> |flink.operator.KafkaSourceReader.topic.resources.partition.5.committedOffset|[topic:resources,partition:5]|
> |flink.operator.KafkaSourceReader.topic.resources.partition.6.committedOffset|[topic:resources,partition:6]|
> |...|...|
> Instead, the native way to represent metrics in Datadog would be:
> ||Metric Name||Tags||
> |flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:0]|
> |flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:1]|
> |flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:2]|
> |flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:3]|
> |flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:4]|
> |flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:5]|
> |flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:6]|
> |...|...|
> The recommended way to configure the scopes for the {{ComponentMetricGroup}} 
> in [Datadog 
> Docs|https://docs.datadoghq.com/integrations/flink/#metric-collection] is to 
> remove all the scopes from the templates for the same reason.
>  
> The metric identifier is built from the scopes and the tags are built from 
> the variables. The issue seems to come from groups being part of both the 
> scopes and the user variables. We can override this behavior by creating a 
> custom metric group for user reported metrics but this is impossible to 
> override for metrics reported by Flink itself (in particular [native 
> RocksDB|https://github.com/apache/flink/blob/664fdaeaccf910c587f3478dd80bb327b441e85a/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBNativeMetricMonitor.java#L78-L80]
>  metrics and 
> [Kafka|https://github.com/apache/flink/blob/99c2a415e9eeefafacf70762b6f54070f7911ceb/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/internals/AbstractFetcher.java#L501-L506]
>  metrics).
>  
> I couldn't think of a simple, clean and backward compatible way to achieve 
> such a change though so I'm looking for feedback on how to proceed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-24756) Flink metric identifiers contain group variables.

Reply via email to