[jira] [Updated] (NIFI-10110) Add in OpenTelemetry integration to support distributed tracing

brian (Jira) Thu, 09 Jun 2022 19:58:05 -0700


     [ 
https://issues.apache.org/jira/browse/NIFI-10110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


brian updated NIFI-10110:
-------------------------
    Description: 
It would be great if we had a way to trace data as it flows through NiFi and 
identify bottlenecks in our flows. Distributed tracing offers advantages where 
we can trace data as it moves through various systems (which can include other 
NiFi instances/clusters) and enable teams to determine the level of visibility 
through sampling.

Ideally, we make taking advantage of tracing as easy as possible by adding it 
at the AbstractProcessor level or by providing annotations that can be added to 
the various processors.

Some things to think about:
 * How to expose flowfile attributes to inject into tags so that we can include 
some metadata in the spans for a given trace
 * How to take those flowfile attributes and make sampling decisions based on 
simple key/val matching (i.e. I want to sample data from a specific system)
 * Processors that egress data (i.e. posthttp, kafka) will need to include a 
w3c trace context header so that the downstream system can continue the trace

The biggest question will be how to properly handle span context and dealing 
with batching, which the otel spec provides some insight into that 
[https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/overview.md#links-between-spans]

Java lib: [https://github.com/open-telemetry/opentelemetry-java-instrumentation]

Messaging system spec draft: 
[https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/messaging.md]

Messaging spec PRs: 
[https://github.com/open-telemetry/oteps/pulls?q=is%3Apr+is%3Aopen+messaging]

We did try running NiFi with the otel autoinstrumentation and some of the 
processors we used generated spans, but did not carry the context between them, 
so otel generated a trace per processor...Ideally, we'd implement native otel 
into NiFi to remove the startup cost of byte code injection

  was:
It would be great if we had a way to trace data as it flows through NiFi and 
identify bottlenecks in our flows. Distributed tracing offers advantages where 
we can trace data as it moves through various systems (which can include other 
NiFi instances/clusters) and enable teams to determine the level of visibility 
through sampling.

Ideally, we make taking advantage of tracing as easy as possible by adding it 
at the AbstractProcessor level or by providing annotations that can be added to 
the various processors.

Some things to think about:
 * How to expose flowfile attributes to inject into tags so that we can include 
some metadata in the spans for a given trace
 * How to take those flowfile attributes and make sampling decisions based on 
simple key/val matching (i.e. I want to sample data from a specific system)
 * Processors that egress data (i.e. posthttp, kafka) will need to include a 
w3c trace context header so that the downstream system can continue the trace

The biggest question will be how to properly handle span context and dealing 
with batching, which the otel spec provides some insight into that 
[https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/overview.md#links-between-spans]

Java lib: [https://github.com/open-telemetry/opentelemetry-java-instrumentation]

Messaging system spec draft: 
[https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/messaging.md]

Messaging spec PRs: 
[https://github.com/open-telemetry/oteps/pulls?q=is%3Apr+is%3Aopen+messaging]


> Add in OpenTelemetry integration to support distributed tracing
> ---------------------------------------------------------------
>
>                 Key: NIFI-10110
>                 URL: https://issues.apache.org/jira/browse/NIFI-10110
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Core Framework
>            Reporter: brian
>            Priority: Major
>              Labels: opentelemetry, tracing
>
> It would be great if we had a way to trace data as it flows through NiFi and 
> identify bottlenecks in our flows. Distributed tracing offers advantages 
> where we can trace data as it moves through various systems (which can 
> include other NiFi instances/clusters) and enable teams to determine the 
> level of visibility through sampling.
> Ideally, we make taking advantage of tracing as easy as possible by adding it 
> at the AbstractProcessor level or by providing annotations that can be added 
> to the various processors.
> Some things to think about:
>  * How to expose flowfile attributes to inject into tags so that we can 
> include some metadata in the spans for a given trace
>  * How to take those flowfile attributes and make sampling decisions based on 
> simple key/val matching (i.e. I want to sample data from a specific system)
>  * Processors that egress data (i.e. posthttp, kafka) will need to include a 
> w3c trace context header so that the downstream system can continue the trace
> The biggest question will be how to properly handle span context and dealing 
> with batching, which the otel spec provides some insight into that 
> [https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/overview.md#links-between-spans]
> Java lib: 
> [https://github.com/open-telemetry/opentelemetry-java-instrumentation]
> Messaging system spec draft: 
> [https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/messaging.md]
> Messaging spec PRs: 
> [https://github.com/open-telemetry/oteps/pulls?q=is%3Apr+is%3Aopen+messaging]
> We did try running NiFi with the otel autoinstrumentation and some of the 
> processors we used generated spans, but did not carry the context between 
> them, so otel generated a trace per processor...Ideally, we'd implement 
> native otel into NiFi to remove the startup cost of byte code injection



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (NIFI-10110) Add in OpenTelemetry integration to support distributed tracing

Reply via email to