Long ago and along these lines, I’d explored the idea of contributing
provenance/lineage as a top-level signal to open telemetry.  They supported
the idea, but I didn’t have the cycles to see it through at the time.

https://github.com/open-telemetry/opentelemetry-specification/issues/3447

While NiFi does support visualizing basic telemetry within the app, I think
most elect to externalize it to standard observability tooling. This was my
motive for the idea and perhaps it would be a valid venture here as well.

Thanks,
Mike

On Wed, Mar 11, 2026 at 18:39 Pierre Villard <[email protected]>
wrote:

> Hi David,
>
> I think this would be a great improvement for NiFi. I have considered a
> similar approach in the past but I didn't have the time to pursue it
> further, so I ended up using a reporting task instead. I do think that
> having extensibility of the provenance repository would be the best
> approach for anything production grade.
>
> I would be very interested in seeing this move forward and agree that this
> would need to follow the NIP process.
>
> Thanks,
> Pierre
>
>
> Le mer. 11 mars 2026 à 18:25, David Young <[email protected]> a
> écrit :
>
> > Hello Team!
> >
> > I've been working with NiFi for a bit now and am seeing a usage pattern
> > within my team that I think could be improved. We have thrown around the
> > idea of creating an additional provenance repository implementation that
> > would allow the storage and retrieval of `ProveanceEventRecords` in an
> > external database / service to support more cloud-centric deployments.
> >
> > Expanding where NiFi can store provenance would allow the
> instance/cluster
> > itself to offload the storage and management of provenance events to an
> > external tool. e.g. Elasticsearch / Opensearch, Solr, etc.
> >
> > When targeting cloud based deployments of NiFi's, resource constraints
> are
> > an important consideration. Externalizing some database-like features
> would
> > allow more resources to be allocated to data processing tasks. Also, in
> the
> > event that a container or VM needs to be replaced or scaled down, having
> > provenance stored in an external service would still allow other nodes in
> > the cluster to access those events.
> >
> > My goal is to refactor some of the existing implementations within the
> > nifi-data-provenance-utils module to decouple them from being
> disk-centric.
> > To go along with that, I'd like to create some new interfaces that
> external
> > services could be built against.
> >
> > In my research and prototyping for this, I've run into several situations
> > where, while trying to follow the existing patterns, sub-typing some of
> the
> > existing things doesn't make sense for an external provider.
> >
> > I don't yet have any complete implementations due to the amount of work I
> > think would be involved. So far my research has primarily been with using
> > Elasticsearch as a backing store.
> >
> > I believe this would rise to the level of requiring a NIP and would like
> to
> > see how the larger dev team feels about this.
> > Thank you!
> >
> > --
> > -David Y.
> >
>

Reply via email to