[ 
https://issues.apache.org/jira/browse/NIFI-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Witt updated NIFI-4713:
---------------------------
    Fix Version/s: 1.14.0

> Datadog Metrics Alignment
> -------------------------
>
>                 Key: NIFI-4713
>                 URL: https://issues.apache.org/jira/browse/NIFI-4713
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Robert Batts
>            Priority: Major
>              Labels: datadog, metrics
>             Fix For: 1.14.0
>
>
> Metrics that are being fed into Datadog from Nifi do not seem to align to the 
> Nifi model. Therefore, I am proposing the following.
> # Change the metric names to work better with Datadog
> # Become more reliant on tagging
> # Allow custom tagging
> Currently, metrics are being sent to Datadog in the following format:
> <metricsPrefix>.<processorName/flow>.<metricName>
> However, Datadog is more of a reuse a metric name and filter via tagging 
> system. So in Datadog, something with a metric name of 
> <metricsPrefix>.<metricName> with a tag of <processorName> works better than 
> one unique metric per processor (in an event where there is no processorName, 
> exclude the tag instead of adding 'flow'). 
> Consider the way Datadog does Kafka. The metric kafka.consumer_lag represents 
> the current lag of a topic (tag) for a given consumer_group (tag) over all 
> partitions (tag). 
> For the same moment in time:
> kafka.consumer_lag = 5 <topic:a, consumer_group:nifi, partition:0>
> kafka.consumer_lag = 7 <topic:a, consumer_group:nifi, partition:1>
> kafka.consumer_lag = 22 <topic:a, consumer_group:python, partition:0>
> kafka.consumer_lag = 19 <topic:a, consumer_group:python, partition:1>
> kafka.consumer_lag = 2 <topic:b, consumer_group:nifi, partition:0>
> If I wanted to know what the current lag was for a given consumer_group on 
> all topics, I would include those tags and then sum on the remaining records 
> (which would be the across the partitions). 
> For the same moment in time:
> kafka.consumer_lag = 12 for topic:a and consumer_group:nifi
> kafka.consumer_lag = 2 for topic:b and consumer_group:nifi
> In a Nifi sense, this could allow you to (for example) have a tag that noted 
> this was an aws-sqs pull and aggregate the average number of records being 
> pulled across the entire system instead of on a single process.
> Additionally, there is room for custom tagging as well. For example: I want 
> to be able to aggregate across all Nifi clusters I control. Setting the 
> prefix unique for each cluster breaks this aggregation and might not allow me 
> to filter properly later if I do not set a prefix. But, if custom tagging was 
> allowed, I could set a tag for cluster_name:nifi-1 and then you could have 
> all metrics aggregated but be able to filter down to that specific cluster 
> for other operations. In my opinion, the easiest way to implement this would 
> be to take all non-required attributes from the Datadog controller and use 
> them as the custom tags (these attributes should be considered final/static 
> when loaded). The attributes are already in Key=Value format, so it should be 
> easy enough to switch them over to Key:Value formatting for tagging (once the 
> required attributes are removed).
> (Most if not all work for this is centered on 
> org.apache.nifi.reporting.datadog.DataDogReportingTask)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to