The PrometheusReportingTask and the NIFi API endpoint for Prometheus
metrics are different beasts but they use quite a bit of the same code
[1]. The intent is to report on the same metrics wherever possible,
and I think for the most part we've done that. They don't call each
other, instead they get their own copies of the metrics registries,
and they populate them when triggered. For the REST endpoint, it's
done on-demand. For the Reporting Task, it's done when scheduled. The
Reporting Task came first to provide a way for Prometheus to scrape a
NiFi instance. But as Reporting Tasks are system-level controller
services, they don't get exported to templates, possibly require
configuration after manual instantiation, etc. To that end the REST
endpoint was added, as it gets all the security, configuration, and
HTTP server "for free" so to speak. Also I think the "totals" metrics
might be for the whole cluster where the Reporting Task might only be
for the node, but I'm not positive.

For some of the metrics you added, aren't they constants based on
properties or other settings? If so we probably didn't add them
because it wasn't a useful metric on its own, but there is precedence
for adding such static metrics for the purposes of downstream queries
(used / max * 100% for example).

The other ones (besides repository info) were possibly just oversights
but if they are helpful metrics, then please feel free to add them.
You should find that you can update the appropriate Registry classes
as well as PrometheusMetricsUtil in nifi-prometheus-utils, and if no
new registries are added, I believe both the REST endpoint and the
Reporting Task will both have the new metrics. If you do need to add a
registry (NiFiRepositoryMetricsRegistry for example), you'd want to
follow the same pattern as the others and make the call to
PrometheusMetricsUtil.createNiFiRepositoryMetrics() from both the
endpoint [2] and the reporting task [3].

Last thing I'll mention is that we're using Dropwizard for most of
these metrics, currently at version 4.1.2 but the latest 4.1.x release
is 4.1.17. We might consider an upgrade while adding these metrics;
not much has been done in the metrics-jvm module since 4.1.2 but a
couple of new metrics were added [4], we could expose those as well.

Regards,
Matt

[1] 
https://github.com/apache/nifi/tree/main/nifi-nar-bundles/nifi-extension-utils/nifi-prometheus-utils/src/main/java/org/apache/nifi/prometheus/util
[2] 
https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/StandardNiFiServiceFacade.java#L5380
[3] 
https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-prometheus-bundle/nifi-prometheus-reporting-task/src/main/java/org/apache/nifi/reporting/prometheus/PrometheusReportingTask.java#L134
[4] 
https://github.com/dropwizard/metrics/commit/ccc91ef1ade1975d58595f23caa48d5ed68a6b54#diff-42e4dfff08e984191adc05ecf744f324f7a9039f72e26bafcb779876584e9e7b

On Tue, Feb 9, 2021 at 2:49 AM Noble Numbat <noblenumbat...@gmail.com> wrote:
>
> Hi everyone,
>
> We have added metrics to the Prometheus metrics endpoint in the API
> (/nifi-api/flow/metrics/prometheus) to improve programmatic access to
> NiFi metrics for the purpose of monitoring. We’d like to contribute
> these back to the project for the benefit of others. Please find the
> list of metrics below.
>
> Before I open a JIRA ticket and pull request, I have some questions to
> clarify my understanding and determine what else I will need to add to
> the code.
> 1. How are the use cases different between the Prometheus metrics
> endpoint in the API (/nifi-api/flow/metrics/prometheus) and the
> PrometheusReportingTask? I note that the metrics are almost identical
> between the two.
> 2. Is the intent to keep the metrics in these two endpoints the same?
> That is, if we add metrics to the Prometheus metrics endpoint in the
> API, are we expected to add these to the PrometheusReportingTask as
> well?
> 3. If so, one way to get the metrics data into
> PrometheusReportingTask.java is to make an API call to
> /nifi-api/controller/config. Is that an acceptable way to get metrics
> data for max_event_driven_threads and max_timer_driven_threads?
>
> For context, here are the metrics we’ve added;
> nifi_repository_max_bytes{flowfile}
> nifi_repository_max_bytes{content}
> nifi_repository_max_bytes{provenance}
> nifi_repository_used_bytes{flowfile}
> nifi_repository_used_bytes{content}
> nifi_repository_used_bytes{provenance}
> jvm_deadlocked_thread_count
> max_event_driven_threads
> max_timer_driven_threads
> jvm_heap_non_init
> jvm_heap_non_committed
> jvm_heap_non_max
> jvm_heap_non_used
> jvm_heap_committed
> jvm_heap_init
> jvm_heap_max
>
> thanks

Reply via email to