[ 
https://issues.apache.org/jira/browse/FLINK-6911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Dail updated FLINK-6911:
------------------------------
    Description: 
The StatsDReporter does not escape spaces in the metric name. It is generally 
accepted that spaces in the metric name are a bad idea:

https://stackoverflow.com/questions/29674488/whitespace-in-statsd-metric-name

It should also be noted that the FlinkStatsDReporter was based on the ReadyTalk 
StatsD implementation (this is indicated in the comment). Note that the 
ReadyTalk implementation does replace whitespace:
https://github.com/ReadyTalk/metrics-statsd/blob/master/metrics-statsd-common/src/main/java/com/readytalk/metrics/StatsD.java#L129

Specifically, I am integrating with Telegraf. It actually splits the name on 
spaces and treats these as (name, value, timestamp). It ignores everything 
except the name.
https://github.com/influxdata/telegraf/blob/master/plugins/parsers/graphite/parser.go#L225

Initially I found this issue when I had a space in the job name. Flink encodes 
the job name into the metrics as is. So when I put these into telegraf, all of 
the job level metrics ended up with the same bucket in telegraf.

Flink also uses things like "Sink- <name>" and "Source- <name>" to encode 
source/sink. These also do not work with telegraf. I end up with metrics that 
look like this inside telegraf:

{noformat}
taskmanager_5e453417d87c755da6311b1940cc602f_TurbineHeatProcessor_examples_turbineHeatTest_Sink-
{noformat}

The actual name is truncated after the space.

  was:
The StatsDReporter does not escape spaces in the metric name. It is generally 
accepted that spaces in the metric name are a bad idea:

https://stackoverflow.com/questions/29674488/whitespace-in-statsd-metric-name

Specifically, I am integrating with Telegraf. It actually splits the name on 
spaces and treats these as (name, value, timestamp). It ignores everything 
except the name.
https://github.com/influxdata/telegraf/blob/master/plugins/parsers/graphite/parser.go#L225

Initially I found this issue when I had a space in the job name. Flink encodes 
the job name into the metrics as is. So when I put these into telegraf, all of 
the job level metrics ended up with the same bucket in telegraf.

Flink also uses things like "Sink- <name>" and "Source- <name>" to encode 
source/sink. These also do not work with telegraf. I end up with metrics that 
look like this inside telegraf:

{noformat}
taskmanager_5e453417d87c755da6311b1940cc602f_TurbineHeatProcessor_examples_turbineHeatTest_Sink-
{noformat}

The actual name is truncated after the space.


> StatsD Metrics name should escape spaces 
> -----------------------------------------
>
>                 Key: FLINK-6911
>                 URL: https://issues.apache.org/jira/browse/FLINK-6911
>             Project: Flink
>          Issue Type: Improvement
>          Components: Metrics
>    Affects Versions: 1.3.0
>         Environment: StatsD Metrics with Telegraf server
>            Reporter: Chris Dail
>
> The StatsDReporter does not escape spaces in the metric name. It is generally 
> accepted that spaces in the metric name are a bad idea:
> https://stackoverflow.com/questions/29674488/whitespace-in-statsd-metric-name
> It should also be noted that the FlinkStatsDReporter was based on the 
> ReadyTalk StatsD implementation (this is indicated in the comment). Note that 
> the ReadyTalk implementation does replace whitespace:
> https://github.com/ReadyTalk/metrics-statsd/blob/master/metrics-statsd-common/src/main/java/com/readytalk/metrics/StatsD.java#L129
> Specifically, I am integrating with Telegraf. It actually splits the name on 
> spaces and treats these as (name, value, timestamp). It ignores everything 
> except the name.
> https://github.com/influxdata/telegraf/blob/master/plugins/parsers/graphite/parser.go#L225
> Initially I found this issue when I had a space in the job name. Flink 
> encodes the job name into the metrics as is. So when I put these into 
> telegraf, all of the job level metrics ended up with the same bucket in 
> telegraf.
> Flink also uses things like "Sink- <name>" and "Source- <name>" to encode 
> source/sink. These also do not work with telegraf. I end up with metrics that 
> look like this inside telegraf:
> {noformat}
> taskmanager_5e453417d87c755da6311b1940cc602f_TurbineHeatProcessor_examples_turbineHeatTest_Sink-
> {noformat}
> The actual name is truncated after the space.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to