[
https://issues.apache.org/jira/browse/NIFI-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592352#comment-16592352
]
Corey Fritz edited comment on NIFI-5535 at 8/25/18 12:52 AM:
-------------------------------------------------------------
So I attempted to fix the tagging issue, which I actually did, but that then
just exacerbated another problem. The DataDogReportingTask is just sending way
too many metrics, with way too many tags. Each processor generates 6 metrics
with 2 tags each. Each port generates 9 metrics with 5 tags each. Each
connection generates 6 metrics with 8 tags each. Plus 10 aggregated flow-level
metrics and 13 JVM metrics, each with 2 tags. Datadog considers each unique
combination of a metric name + tag to be a "custom metric". The lowest plan
with Datadog allows an average of 100 "custom metrics" per host (meaning some
could have more, some could have less, as long as the total # of custom metrics
works out to be 100/host).
I have a flow with about 30 processors that resulted in 370 metrics, and I
didn't bother to figure out how many tags, being sent to Datadog. I noticed
that some of the metrics I was actually interested in monitoring were not
showing up in Datadog, and I'm sure it's because we're way over our limit.
There should probably be an opt-in strategy for identifying which sets of
metrics we want to send to Datadog.
So... my proposal is this (and I'm willing to tackle this as time allows):
1. Add an _Enable Monitoring_ property to all processors that is off by default
2. Add an _Enable Monitoring_ property to all ports that is off by default
3. Add an _Enable Monitoring_ property to all connections that is off by default
4. Add the following properties to the DataDogReportingTask
* _Enable Flow-level Monitoring_, off by default
* _Enable JVM Monitoring_, off by default
5. Update the DataDogReportingTask to only submit metrics for components that
have had monitoring explicitly enabled
6. Update the DataDogReportingTask to remove all metric tags except for
_Environment_. I just don't see much value in any of the other tags.
This seems like a pretty large refactoring with a wide scope since it would
touch processors, ports, and connections, as well as the other metric reporting
services, so I'd like to discuss further with someone before proceeding.
was (Author: snagafritz):
So I attempted to fix the tagging issue, which I actually did, but that then
just exacerbated another problem. The DataDogReportingTask is just sending way
too many metrics, with way too many tags. Each processor generates 6 metrics
with 2 tags each. Each port generates 9 metrics with 5 tags each. Each
connection generates 6 metrics with 8 tags each. Plus 10 aggregated flow-level
metrics and 13 JVM metrics, each with 2 tags. Datadog considers each unique
combination of a metric name + tag to be a "custom metric". The lowest plan
with Datadog allows an average of 100 "custom metrics" per host (meaning some
could have more, some could have less, as long as the total # of custom metrics
works out to be 100/host).
I have a flow with about 30 processors that resulted in 370 metrics, and I
didn't bother to figure out how many tags, being sent to Datadog. I noticed
that some of the metrics I was actually interested in monitoring were not
showing up in Datadog, and I'm sure it's because we're way over our limit.
There should probably be an opt-in strategy for identifying which sets of
metrics we want to send to Datadog.
So... my proposal is this (and I'm willing to tackle this as time allows):
1. Add an _Enable Monitoring_ property to all processors that is off by default
2. Add an _Enable Monitoring_ property to all ports that is off by default
3. Add an _Enable Monitoring_ property to all connections that is off by default
4. Add the following properties to the DataDogReportingTask
* _Enable Flow-level Monitoring_, off by default
* _Enable JVM Monitoring_, off by default
5. Update the DataDogReportingTask to only submit metrics for components that
have had monitoring explicitly enabled
6. Update the DataDogReportingTask to remove all metric tags except for
_Environment_. I just don't see much value in any of the other tags.
This seems like a pretty large refactoring with a wide scope since it would
touch processors, ports, and connections, so I'd like to discuss further with
someone before proceeding.
> DataDogReportingTask is not tagging metrics properly
> ----------------------------------------------------
>
> Key: NIFI-5535
> URL: https://issues.apache.org/jira/browse/NIFI-5535
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Affects Versions: 1.7.1
> Reporter: Corey Fritz
> Priority: Major
> Attachments: Screen Shot 2018-08-19 at 12.33.58 AM.png
>
>
> The current (and looks like original) implementation of the
> DataDogReportingTask is not applying metric tags correctly, and as a result,
> the "Environment" configuration property on that task does not work. This
> means that you're not going to be able to use tags to differentiate the
> metric values coming from different environments.
> Currently, every metric reported by this task gets the same set of tags
> applied:
> {code:java}
> connection-destination-id
> connection-destination-name
> connection-group-id
> connection-id
> connection-name
> connection-source-id
> connection-source-name
> dataflow_id
> env
> port-group-id
> port-id
> port-name{code}
> This list is defined here:
> [https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-datadog-bundle/nifi-datadog-reporting-task/src/main/java/org/apache/nifi/reporting/datadog/metrics/MetricsService.java#L111-L126]
> I've attached a screenshot from Datadog demonstrating a JVM metric with all
> of these tags applied.
> Each of these tags should include a value, i.e. "env:dev" instead of just
> "env".
> Other observations:
> * it doesn't make sense to attach the _connection-_ and _port-_ tags to JVM
> metrics
> * I'm not sure I see any value in the _dataflow_id_ tag
> I was hoping for a quick fix when I noticed the environment tagging wasn't
> working, but after reviewing the code I think a not insignificant refactoring
> will be required. I'll try to tackle this if/when time allows.
> See here for more context on Datadog tagging:
> [https://docs.datadoghq.com/tagging]
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)