Hallo Brian,

Jaeger would be a good choice because it is very common (almost the standard 
with OpenTelemetry).
Have you looked at OpenLineage (https://openlineage.io/)? Possibly interesting?!

Thanks
Uwe

> Am 23.05.2023 um 04:57 schrieb Brian Putt <puttbr...@gmail.com>:
> 
> Hello Joe / All,
> 
> Jaeger or Grafana (w/ tempo) offer comparable tools to visualize the trace
> data. I believe additional tools will be needed to get the most out of the
> trace data. We've been experimenting with a number of open source products
> to see what works best for the amount of trace data that NiFi emits. So
> far, Grafana Tempo, Victoria Metrics, and Clickhouse seem to offer a good
> set of features to cover searching / viewing the traces along with
> summarizing certain flowfile attributes. As long as the trace data is in
> OTEL's format, the collector offers flexibility in exporting the data to a
> number of services with ease.
> 
> I would expect a PR to OTEL's java auto instrumentation project over the
> next few months that adds NiFi to its list of instrumentations. If the NiFi
> committers would like a demo / tech exchange to go over the current state
> of the tracing agent, we'd be happy to accommodate. As it stands, the agent
> utilizes flowfile attributes to pass along the tracestate so trace
> propagation can occur across NiFi to NiFi boundaries.
> 
> Thanks,
> 
> Brian
> 
>> On Wed, May 17, 2023 at 1:05 PM Joe Witt <joe.w...@gmail.com> wrote:
>> 
>> Brian Putt, All
>> 
>> Are you aware of any good tools/services that can ingest the traces and
>> provide an interesting view/story/reporting on it?
>> 
>> I could see us emitting otel events instead of our current provenance
>> mechanism and using that both internally to do what we already do but also
>> have a clear/spec friendly way of exporting it to others.
>> 
>> Thanks
>> 
>> On Sat, Jul 30, 2022 at 7:43 AM u...@moosheimer.com <u...@moosheimer.com>
>> wrote:
>> 
>>> Hello Brian, Bryan, Greg, NiFi devs,
>>> 
>>> Integrating OpenTelemetry is a very good idea, especially since the major
>>> cloud providers also rely on it. This could also be interesting for
>>> Stateless NiFi.
>>> 
>>> I have a suggestion that I would like to put up for discussion.
>>> 
>>> Would it be useful to make a list of what extensions or new development
>>> would be helpful for a complete integration of OpenTelemetry?
>>> 
>>> I'm thinking of ConsumeMQTT and PublishMQTT, for example. Currently these
>>> can do max. MQTT version 3.11, but since version 5 the User Properties
>>> exist, which are similar to the HTTP header fields.
>>> Thus one could implement OpenTelemetry in the MQTT processors similarly
>> as
>>> in HTTP.
>>> 
>>> With a list we could make an overview of the "necessary" adjustments and
>>> advertise for support.
>>> 
>>> If what I write is nonsense, then I may not have understood something and
>>> I take it all back :)
>>> 
>>> Mit freundlichen Grüßen / best regards
>>> Kay-Uwe Moosheimer
>>> 
>>>> Am 29.07.2022 um 05:09 schrieb Brian Putt <puttbr...@gmail.com>:
>>>> 
>>>> Hello Bryan / Greg / NiFi devs,
>>>> 
>>>> Distributed tracing (DT) is similar to provenance in that it shows the
>>> path
>>>> a particular flowfile travels, but its core selling point is that it
>>>> supports tracing across multiple systems/services regardless of what's
>>>> receiving the data. Provenance is a fantastic feature and there are
>>>> instances where one might want to draw that bigger picture of
>> identifying
>>>> bottlenecks as data flows from one system to another and that system
>>>> may/may not be using NiFi.
>>>> 
>>>> DT utilizes three ids: traceId, parentId, and spanId. While a tree can
>> be
>>>> built using two ids, the third id (traceId) helps bring all of the
>>> relevant
>>>> information out of a datastore more easily.
>>>> DT is focused more on performance and identifying bottlenecks in one or
>>>> more systems. Imagine if NiFi were receiving data from various sources
>>>> (i.e. HTTP, Kafka, SQS) and NiFi egressed to other sources (HTTP,
>> Kafka,
>>>> NiFi).
>>>> DT provides a spec that we'd be able to follow and correlate the data
>> as
>>> it
>>>> traverses from system to system. Each system that participates in the
>> DT
>>>> ecosystem would simply emit information (a trace is made up of one or
>>> more
>>>> spans) and there'd be a collection system which would aggregate all of
>>>> these spans and would draw a bigger picture of the path that data went
>>>> through and could help identify key bottlenecks.
>>>> 
>>>> OpenTelemetry (OTEL) provides clients (across many languages, including
>>>> java) where developers can instrument their library's APIs and
>>> participate
>>>> in a DT ecosystem as it adheres to the tracing spec. Egressing trace
>> data
>>>> is possible without using OTEL, but then we may find ourselves having
>> to
>>>> recreate the wheel, but could be optimized for NiFi.
>>>> 
>>>> Creating a reporting task could certainly be a path, mainly have a few
>>>> concerns with that:
>>>> 
>>>> 1. If provenance is disabled, will provenance events still be emitted
>> and
>>>> be collected by a new reporting task?
>>>> 2. There'll be an impact on performance, how much is unknown. OTEL is
>>>> gaining traction across industry and there are ways to mitigate
>>>> performance, mainly sampling and the fact that *tracing is best
>> effort*.
>>>> Spans would be emitted from NiFi via UDP to a collector on the same
>>> network
>>>> 3. Would there be any issues with appending a flowfile attribute that
>> is
>>>> carried throughout the flow where it maintains the traceId,
>> parentSpanId,
>>>> and trace flags? See below for more details
>>>> 
>>>> There's a W3C spec (Trace context) which includes a formatted string
>> that
>>>> would be propagated to services (HTTP, Kafka, etc...). So if NiFi were
>> to
>>>> put information onto kafka, any consumers of that data would be able to
>>>> continue the trace and help draw the bigger picture.
>>>> 
>>>> W3C Spec: https://www.w3.org/TR/trace-context/#traceparent-header
>>>> 
>>>> For #2, since DT is focused on performance, sampling can help alleviate
>>>> chatter over the wire and ideally, 0.01% would draw the same picture as
>>> 1%
>>>> or 10%+. This is certainly different from provenance as DT is focused
>> on
>>>> performance over quality of the data and should not be thought of as
>>>> auditing.
>>>> 
>>> 
>> https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#sampler
>>>> 
>>>>> On Thu, Jul 28, 2022 at 5:01 PM Bryan Bende <bbe...@gmail.com> wrote:
>>>>> 
>>>>> Hi Greg,
>>>>> 
>>>>> I don't really know anything about OpenTelemetry, but from the
>>>>> perspective of integrating something into the framework, some things
>>>>> to consider...
>>>>> 
>>>>> Is there some way to piggy-back on provenance and use a ReportingTask
>>>>> to process provenance events and report something to OpenTelemetry?
>>>>> 
>>>>> If something new does need to be added, it should probably be an
>>>>> extension point where there is an interface in the framework-api and
>>>>> different implementations can be plugged in.
>>>>> Ideally the framework itself wouldn't have any knowledge of
>>>>> OpenTelemetry specifically, it would only be reporting some
>>>>> information, which could then be used in some way by the OpenTelemetry
>>>>> implementation.
>>>>> 
>>>>> How does NiFi actually communicate with OpenTelemetry? Are you
>>>>> expecting to send data to OpenTelemetry in this new method you are
>>>>> suggesting?
>>>>> That would likely have a significant impact on the performance of the
>>> flow.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Bryan
>>>>> 
>>>>>> On Thu, Jul 28, 2022 at 3:17 PM glma...@uwe.nsa.gov <
>>> glma...@uwe.nsa.gov>
>>>>>> wrote:
>>>>>> 
>>>>>> Nifi Devs,
>>>>>> 
>>>>>> My team and I are looking for guidance on how we can extend Apache
>>>>> Nifi's capabilities. Specifically we're looking to include distributed
>>>>> tracing. We'll approach this effort as if we're the tracing experts
>> and
>>>>> simply seeking implementation guidance. Our developers have good
>>> exposure
>>>>> to working with Nifi and creating custom processors. We plan to fork
>> the
>>>>> project to begin this effort but want to make sure we approach this
>> with
>>>>> the best possible direction for community adoption.
>>>>>> 
>>>>>> Our initial thoughts on this approach would be to piggyback on how
>>>>> Provenance was implemented. We essentially want to include a
>> subroutine
>>> or
>>>>> method that gets implicitly invoked upon a processors 'onTrigger'
>>> method.
>>>>> From there we would analyze the FlowFiles attributes to check for the
>>>>> existence of 'traceId' and/or propagate one if found.
>>>>>> 
>>>>>> We can expound upon all of these tracing/observability details if
>> that
>>>>> helps by any means. We're able to provide more detailed scope of this
>>> task
>>>>> as well but for now we just want to get feed back for our overall goal
>>> and
>>>>> proposed approach.
>>>>>> 
>>>>>> Thanks,
>>>>>> Greg Marshall
>>>>> 
>>> 
>>> 
>> 

Reply via email to