Robert Batts created NIFI-4713:
----------------------------------
Summary: Datadog Metrics Alignment
Key: NIFI-4713
URL: https://issues.apache.org/jira/browse/NIFI-4713
Project: Apache NiFi
Issue Type: Improvement
Components: Extensions
Affects Versions: 1.4.0
Reporter: Robert Batts
Metrics that are being fed into Datadog from Nifi do not seem to align to the
Nifi model. Therefore, I am proposing the following.
# Change the metric names to work better with Datadog
# Become more reliant on tagging
# Allow custom tagging
Currently, metrics are being sent to Datadog in the following format:
<metricsPrefix>.<processorName/flow>.<metricName>
However, Datadog is more of a reuse a metric name and filter via tagging
system. So in Datadog, something with a metric name of
<metricsPrefix>.<metricName> with a tag of <processorName> works better than
one unique metric per processor (in an event where there is no processorName,
exclude the tag instead of adding 'flow').
Consider the way Datadog does Kafka. The metric kafka.consumer_lag represents
the current lag of a topic (tag) for a given consumer_group (tag) over all
partitions (tag).
For the same moment in time:
kafka.consumer_lag = 5 <topic:a, consumer_group:nifi, partition:0>
kafka.consumer_lag = 7 <topic:a, consumer_group:nifi, partition:1>
kafka.consumer_lag = 22 <topic:a, consumer_group:python, partition:0>
kafka.consumer_lag = 19 <topic:a, consumer_group:python, partition:1>
kafka.consumer_lag = 2 <topic:b, consumer_group:nifi, partition:0>
If I wanted to know what the current lag was for a given consumer_group on all
topics, I would include those tags and then sum on the remaining records (which
would be the across the partitions).
For the same moment in time:
kafka.consumer_lag = 12 for topic:a and consumer_group:nifi
kafka.consumer_lag = 2 for topic:b and consumer_group:nifi
In a Nifi sense, this could allow you to (for example) have a tag that noted
this was an aws-sqs pull and aggregate the average number of records being
pulled across the entire system instead of on a single process.
Additionally, there is room for custom tagging as well. For example: I want to
be able to aggregate across all Nifi clusters I control. Setting the prefix
unique for each cluster breaks this aggregation and might not allow me to
filter properly later if I do not set a prefix. But, if custom tagging was
allowed, I could set a tag for cluster_name:nifi-1 and then you could have all
metrics aggregated but be able to filter down to that specific cluster for
other operations. In my opinion, the easiest way to implement this would be to
take all non-required attributes from the Datadog controller and use them as
the custom tags (these attributes should be considered final/static when
loaded). The attributes are already in Key=Value format, so it should be easy
enough to switch them over to Key:Value formatting for tagging (once the
required attributes are removed).
(Most if not all work for this is centered on
org.apache.nifi.reporting.datadog.DataDogReportingTask)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)