[ 
https://issues.apache.org/jira/browse/FLINK-25164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17453447#comment-17453447
 ] 

Yaroslav Tkachenko commented on FLINK-25164:
--------------------------------------------

[~chesnay] yes! That, 
[https://github.com/drivetribe/flink-metrics-datadog-statsd] and a few others. 
I believe there is a demand for a proper reporter that comes with Flink out of 
the box and follows all the best practices. 

The package that you mentioned has a few issues IMO, like:
 * Histogram representation is not compatible with what Datadog expects. 
DatadogHttpReporter does it properly.
 * Negative values are not properly handled ([this 
assumption|https://github.com/aroch/flink-metrics-dogstatsd/blob/3cb371acc7decbc923271b0d397107beae144db2/src/main/java/com/aroch/flink/metrics/dogstatsd/DogStatsDReporter.java#L219-L222]
 is correct for StatsD, but not DogStatsD).

> DogStatsD Metrics Reporter
> --------------------------
>
>                 Key: FLINK-25164
>                 URL: https://issues.apache.org/jira/browse/FLINK-25164
>             Project: Flink
>          Issue Type: New Feature
>          Components: Runtime / Metrics
>            Reporter: Yaroslav Tkachenko
>            Priority: Major
>              Labels: pull-request-available
>
> At the moment Flink doesn't have a metrics reporter that can work with very 
> popular DataDog agents out of the box. DataDog agents use DogStatsD protocol 
> which is a superset of StatsD. The existing StatsDReporter is too limited to 
> be useful. 
> https://issues.apache.org/jira/browse/FLINK-7009 attempted to address this 
> issue by introducing a separate mode in the StatsDReporter, however, I don't 
> believe it's possible to extend it to support DogStatsD due to a few core 
> differences:
>  * ALL metrics in StatsDReporter are reported as gauges, which makes counters 
> wrong
>  * Negative values are interpreted as reductions instead of absolute values, 
> which is not true in the case of DogStatsD
>  * The list of histogram metrics is not compatible with [the way they're 
> represented in 
> Datadog|https://docs.datadoghq.com/developers/metrics/types/?tab=histogram]
> I think this warrants having separate metrics reporter dedicated to DogStatsD 
> protocol.
> Also, most of the changes originally proposed in 
> https://issues.apache.org/jira/browse/FLINK-7009 are still relevant:
>  * convert output to ascii alphanumeric characters with underbar, delimited 
> by periods. Runs of invalid characters within a metric segment would be 
> collapsed to a single underbar.
>  * report all Flink variables as tags
>  * compress overly long segments, say over 50 chars, to a symbolic 
> representation of the metric name, to preserve the unique metric time series 
> but avoid downstream truncation
>  * compress 32 character Flink IDs like tm_id, task_id, job_id, 
> task_attempt_id, to the first 8 characters, again to preserve enough 
> distinction amongst metrics while trimming up to 96 characters from the metric
>  * remove object references from names, such as the instance hash id of the 
> serializer



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to