Robert Batts created NIFI-4713:
----------------------------------

             Summary: Datadog Metrics Alignment
                 Key: NIFI-4713
                 URL: https://issues.apache.org/jira/browse/NIFI-4713
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Extensions
    Affects Versions: 1.4.0
            Reporter: Robert Batts


Metrics that are being fed into Datadog from Nifi do not seem to align to the 
Nifi model. Therefore, I am proposing the following.

# Change the metric names to work better with Datadog
# Become more reliant on tagging
# Allow custom tagging

Currently, metrics are being sent to Datadog in the following format:

<metricsPrefix>.<processorName/flow>.<metricName>

However, Datadog is more of a reuse a metric name and filter via tagging 
system. So in Datadog, something with a metric name of 
<metricsPrefix>.<metricName> with a tag of <processorName> works better than 
one unique metric per processor (in an event where there is no processorName, 
exclude the tag instead of adding 'flow'). 

Consider the way Datadog does Kafka. The metric kafka.consumer_lag represents 
the current lag of a topic (tag) for a given consumer_group (tag) over all 
partitions (tag). 

For the same moment in time:
kafka.consumer_lag = 5 <topic:a, consumer_group:nifi, partition:0>
kafka.consumer_lag = 7 <topic:a, consumer_group:nifi, partition:1>
kafka.consumer_lag = 22 <topic:a, consumer_group:python, partition:0>
kafka.consumer_lag = 19 <topic:a, consumer_group:python, partition:1>
kafka.consumer_lag = 2 <topic:b, consumer_group:nifi, partition:0>

If I wanted to know what the current lag was for a given consumer_group on all 
topics, I would include those tags and then sum on the remaining records (which 
would be the across the partitions). 

For the same moment in time:
kafka.consumer_lag = 12 for topic:a and consumer_group:nifi
kafka.consumer_lag = 2 for topic:b and consumer_group:nifi

In a Nifi sense, this could allow you to (for example) have a tag that noted 
this was an aws-sqs pull and aggregate the average number of records being 
pulled across the entire system instead of on a single process.

Additionally, there is room for custom tagging as well. For example: I want to 
be able to aggregate across all Nifi clusters I control. Setting the prefix 
unique for each cluster breaks this aggregation and might not allow me to 
filter properly later if I do not set a prefix. But, if custom tagging was 
allowed, I could set a tag for cluster_name:nifi-1 and then you could have all 
metrics aggregated but be able to filter down to that specific cluster for 
other operations. In my opinion, the easiest way to implement this would be to 
take all non-required attributes from the Datadog controller and use them as 
the custom tags (these attributes should be considered final/static when 
loaded). The attributes are already in Key=Value format, so it should be easy 
enough to switch them over to Key:Value formatting for tagging (once the 
required attributes are removed).

(Most if not all work for this is centered on 
org.apache.nifi.reporting.datadog.DataDogReportingTask)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to