Hello Team!

I've been working with NiFi for a bit now and am seeing a usage pattern
within my team that I think could be improved. We have thrown around the
idea of creating an additional provenance repository implementation that
would allow the storage and retrieval of `ProveanceEventRecords` in an
external database / service to support more cloud-centric deployments.

Expanding where NiFi can store provenance would allow the instance/cluster
itself to offload the storage and management of provenance events to an
external tool. e.g. Elasticsearch / Opensearch, Solr, etc.

When targeting cloud based deployments of NiFi's, resource constraints are
an important consideration. Externalizing some database-like features would
allow more resources to be allocated to data processing tasks. Also, in the
event that a container or VM needs to be replaced or scaled down, having
provenance stored in an external service would still allow other nodes in
the cluster to access those events.

My goal is to refactor some of the existing implementations within the
nifi-data-provenance-utils module to decouple them from being disk-centric.
To go along with that, I'd like to create some new interfaces that external
services could be built against.

In my research and prototyping for this, I've run into several situations
where, while trying to follow the existing patterns, sub-typing some of the
existing things doesn't make sense for an external provider.

I don't yet have any complete implementations due to the amount of work I
think would be involved. So far my research has primarily been with using
Elasticsearch as a backing store.

I believe this would rise to the level of requiring a NIP and would like to
see how the larger dev team feels about this.
Thank you!

-- 
-David Y.

Reply via email to