[
https://issues.apache.org/jira/browse/NIFI-10110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
brian updated NIFI-10110:
-------------------------
Description:
It would be great if we had a way to trace data as it flows through NiFi and
identify bottlenecks in our flows. Distributed tracing offers advantages where
we can trace data as it moves through various systems (which can include other
NiFi instances/clusters) and enable teams to determine the level of visibility
through sampling.
Ideally, we make taking advantage of tracing as easy as possible by adding it
at the AbstractProcessor level or by providing annotations that can be added to
the various processors.
Some things to think about:
* How to expose flowfile attributes to inject into tags so that we can include
some metadata in the spans for a given trace
* How to take those flowfile attributes and make sampling decisions based on
simple key/val matching (i.e. I want to sample data from a specific system)
* Processors that egress data (i.e. posthttp, kafka) will need to include a
w3c trace context header so that the downstream system can continue the trace
The biggest question will be how to properly handle span context and dealing
with batching, which the otel spec provides some insight into that
[https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/overview.md#links-between-spans]
Java lib: [https://github.com/open-telemetry/opentelemetry-java-instrumentation]
Messaging system spec draft:
[https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/messaging.md]
Messaging spec PRs:
[https://github.com/open-telemetry/oteps/pulls?q=is%3Apr+is%3Aopen+messaging]
We did try running NiFi with the otel autoinstrumentation and some of the
processors we used generated spans, but did not carry the context between them,
so otel generated a trace per processor...Ideally, we'd implement native otel
into NiFi to remove the startup cost of byte code injection
was:
It would be great if we had a way to trace data as it flows through NiFi and
identify bottlenecks in our flows. Distributed tracing offers advantages where
we can trace data as it moves through various systems (which can include other
NiFi instances/clusters) and enable teams to determine the level of visibility
through sampling.
Ideally, we make taking advantage of tracing as easy as possible by adding it
at the AbstractProcessor level or by providing annotations that can be added to
the various processors.
Some things to think about:
* How to expose flowfile attributes to inject into tags so that we can include
some metadata in the spans for a given trace
* How to take those flowfile attributes and make sampling decisions based on
simple key/val matching (i.e. I want to sample data from a specific system)
* Processors that egress data (i.e. posthttp, kafka) will need to include a
w3c trace context header so that the downstream system can continue the trace
The biggest question will be how to properly handle span context and dealing
with batching, which the otel spec provides some insight into that
[https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/overview.md#links-between-spans]
Java lib: [https://github.com/open-telemetry/opentelemetry-java-instrumentation]
Messaging system spec draft:
[https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/messaging.md]
Messaging spec PRs:
[https://github.com/open-telemetry/oteps/pulls?q=is%3Apr+is%3Aopen+messaging]
> Add in OpenTelemetry integration to support distributed tracing
> ---------------------------------------------------------------
>
> Key: NIFI-10110
> URL: https://issues.apache.org/jira/browse/NIFI-10110
> Project: Apache NiFi
> Issue Type: New Feature
> Components: Core Framework
> Reporter: brian
> Priority: Major
> Labels: opentelemetry, tracing
>
> It would be great if we had a way to trace data as it flows through NiFi and
> identify bottlenecks in our flows. Distributed tracing offers advantages
> where we can trace data as it moves through various systems (which can
> include other NiFi instances/clusters) and enable teams to determine the
> level of visibility through sampling.
> Ideally, we make taking advantage of tracing as easy as possible by adding it
> at the AbstractProcessor level or by providing annotations that can be added
> to the various processors.
> Some things to think about:
> * How to expose flowfile attributes to inject into tags so that we can
> include some metadata in the spans for a given trace
> * How to take those flowfile attributes and make sampling decisions based on
> simple key/val matching (i.e. I want to sample data from a specific system)
> * Processors that egress data (i.e. posthttp, kafka) will need to include a
> w3c trace context header so that the downstream system can continue the trace
> The biggest question will be how to properly handle span context and dealing
> with batching, which the otel spec provides some insight into that
> [https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/overview.md#links-between-spans]
> Java lib:
> [https://github.com/open-telemetry/opentelemetry-java-instrumentation]
> Messaging system spec draft:
> [https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/messaging.md]
> Messaging spec PRs:
> [https://github.com/open-telemetry/oteps/pulls?q=is%3Apr+is%3Aopen+messaging]
> We did try running NiFi with the otel autoinstrumentation and some of the
> processors we used generated spans, but did not carry the context between
> them, so otel generated a trace per processor...Ideally, we'd implement
> native otel into NiFi to remove the startup cost of byte code injection
--
This message was sent by Atlassian Jira
(v8.20.7#820007)