Hi NiFi team, The ConsumeKafka <https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-kafka-2-6-nar/stable/org.apache.nifi.processors.kafka.pubsub.ConsumeKafka_2_6/index.html> processor currently offers at-least-once semantics through asynchronous offset commits to Kafka, meaning that the data can be duplicated for example if a cluster restart occurs before the offsets are acknowledged. One way to guarantee exactly-once semantics (assuming no data loss in NiFi from hardware failures) instead could be to persist some processor state within NiFi, tracking the current offsets. But, this state management has to be atomic with operations performed on the flowfile repository, therefore the StateManager <https://nifi.apache.org/nifi-docs/developer-guide.html#state_manager> is unsuitable.
I was wondering whether it would be feasible, in a reliable way, to retrieve such state from the latest flowfiles' attributes, presumably from the provenance repository? Which I assume is atomic with / recovered from the flowfile repository? I am not familiar with Lucene; I may be looking for something similar to WriteAheadProvenanceRepository::getLatestCachedEvents <https://github.com/apache/nifi/blob/main/nifi-framework-bundle/nifi-framework-extensions/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/WriteAheadProvenanceRepository.java#L260>, but that would leverage up-to-date persisted information, rather than an in-memory map that is lost upon cluster restart. Thanks,