Hi David,

I also think omitting external content storage from this feature is for the
best.

Surfacing all provenance events through the UI would add a third state that
a provenance event could have for its data: "present but inaccessible".
That's in addition to "deleted" and "present". That way a user can know
they can get the content if they start up a given node

This is reminding me of the AzureLogAnalyticsProvenanceReportingTask. I've
considered using it for when we begin nifi clustering at work. A nifi
native solution would be my preference because of the seamless experience I
would get.

-Eric


On Wed, Mar 11, 2026, 11:58 AM David Young <[email protected]> wrote:

> Hello Eirc,
>
> Unless I'm mistaken, a provenance record itself doesn't have any content
> attached, it's all metadata and attributes.
> Now, that's not to say there isn't a linkage that would potentially be
> broken if the event were to be retrieved on a different cluster node.
> That could be handled with an external content store, but outside the scope
> of this particular bit of work.
>
> On Wed, Mar 11, 2026 at 2:49 PM Eric Secules <[email protected]> wrote:
>
> > Hi David,
> >
> > I'd really like this feature as well, especially for clustered nifi that
> > changes size based on load.
> >
> > How do you envision dealing with flow file content attached to provenance
> > records?
> >
> > Thanks,
> > Eric Secules
> >
> >
> > On Wed, Mar 11, 2026, 11:30 AM Mike Hogue <[email protected]> wrote:
> >
> > > Long ago and along these lines, I’d explored the idea of contributing
> > > provenance/lineage as a top-level signal to open telemetry.  They
> > supported
> > > the idea, but I didn’t have the cycles to see it through at the time.
> > >
> > >
> >
> https://github.com/open-telemetry/opentelemetry-specification/issues/3447
> > >
> > > While NiFi does support visualizing basic telemetry within the app, I
> > think
> > > most elect to externalize it to standard observability tooling. This
> was
> > my
> > > motive for the idea and perhaps it would be a valid venture here as
> well.
> > >
> > > Thanks,
> > > Mike
> > >
> > > On Wed, Mar 11, 2026 at 18:39 Pierre Villard <
> > [email protected]>
> > > wrote:
> > >
> > > > Hi David,
> > > >
> > > > I think this would be a great improvement for NiFi. I have
> considered a
> > > > similar approach in the past but I didn't have the time to pursue it
> > > > further, so I ended up using a reporting task instead. I do think
> that
> > > > having extensibility of the provenance repository would be the best
> > > > approach for anything production grade.
> > > >
> > > > I would be very interested in seeing this move forward and agree that
> > > this
> > > > would need to follow the NIP process.
> > > >
> > > > Thanks,
> > > > Pierre
> > > >
> > > >
> > > > Le mer. 11 mars 2026 à 18:25, David Young <[email protected]>
> a
> > > > écrit :
> > > >
> > > > > Hello Team!
> > > > >
> > > > > I've been working with NiFi for a bit now and am seeing a usage
> > pattern
> > > > > within my team that I think could be improved. We have thrown
> around
> > > the
> > > > > idea of creating an additional provenance repository implementation
> > > that
> > > > > would allow the storage and retrieval of `ProveanceEventRecords` in
> > an
> > > > > external database / service to support more cloud-centric
> > deployments.
> > > > >
> > > > > Expanding where NiFi can store provenance would allow the
> > > > instance/cluster
> > > > > itself to offload the storage and management of provenance events
> to
> > an
> > > > > external tool. e.g. Elasticsearch / Opensearch, Solr, etc.
> > > > >
> > > > > When targeting cloud based deployments of NiFi's, resource
> > constraints
> > > > are
> > > > > an important consideration. Externalizing some database-like
> features
> > > > would
> > > > > allow more resources to be allocated to data processing tasks.
> Also,
> > in
> > > > the
> > > > > event that a container or VM needs to be replaced or scaled down,
> > > having
> > > > > provenance stored in an external service would still allow other
> > nodes
> > > in
> > > > > the cluster to access those events.
> > > > >
> > > > > My goal is to refactor some of the existing implementations within
> > the
> > > > > nifi-data-provenance-utils module to decouple them from being
> > > > disk-centric.
> > > > > To go along with that, I'd like to create some new interfaces that
> > > > external
> > > > > services could be built against.
> > > > >
> > > > > In my research and prototyping for this, I've run into several
> > > situations
> > > > > where, while trying to follow the existing patterns, sub-typing
> some
> > of
> > > > the
> > > > > existing things doesn't make sense for an external provider.
> > > > >
> > > > > I don't yet have any complete implementations due to the amount of
> > > work I
> > > > > think would be involved. So far my research has primarily been with
> > > using
> > > > > Elasticsearch as a backing store.
> > > > >
> > > > > I believe this would rise to the level of requiring a NIP and would
> > > like
> > > > to
> > > > > see how the larger dev team feels about this.
> > > > > Thank you!
> > > > >
> > > > > --
> > > > > -David Y.
> > > > >
> > > >
> > >
> >
>
>
> --
> -David
>

Reply via email to