[
https://issues.apache.org/jira/browse/NIFI-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15866229#comment-15866229
]
ASF GitHub Bot commented on NIFI-3356:
--------------------------------------
Github user markap14 commented on a diff in the pull request:
https://github.com/apache/nifi/pull/1493#discussion_r101097304
--- Diff: nifi-docs/src/main/asciidoc/administration-guide.adoc ---
@@ -2074,7 +2074,25 @@ The Provenance Repository contains the information
related to Data Provenance. T
|====
|*Property*|*Description*
-|nifi.provenance.repository.implementation|The Provenance Repository
implementation. The default value is
org.apache.nifi.provenance.PersistentProvenanceRepository and should only be
changed with caution. To store provenance events in memory instead of on disk
(at the risk of data loss in the event of power/machine failure), set this
property to org.apache.nifi.provenance.VolatileProvenanceRepository.
+|nifi.provenance.repository.implementation|The Provenance Repository
implementation. The default value is
org.apache.nifi.provenance.PersistentProvenanceRepository.
+Two additional repositories are available as and should only be changed
with caution.
--- End diff --
I agree - that was there previously when the only two options were Volatile
and Persistent Prov Repo and the note was there to warn that you should know
what you're doing when you change to Volatile. This warning can be removed now,
I think, since there are two repos that provide persistent storage of the data.
> Provide a newly refactored provenance repository
> ------------------------------------------------
>
> Key: NIFI-3356
> URL: https://issues.apache.org/jira/browse/NIFI-3356
> Project: Apache NiFi
> Issue Type: Task
> Components: Core Framework
> Reporter: Mark Payne
> Assignee: Mark Payne
> Fix For: 1.2.0
>
>
> The Persistent Provenance Repository has been redesigned a few different
> times over several years. The original design for the repository was to
> provide storage of events and sequential iteration over those events via a
> Reporting Task. After that, we added the ability to compress the data so that
> it could be held longer. We then introduced the notion of indexing and
> searching via Lucene. We've since made several more modifications to try to
> boost performance.
> At this point, however, the repository is still the bottleneck for many flows
> that handle large volumes of small FlowFiles. We need a new implementation
> that is based around the current goals for the repository and that can
> provide better throughput.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)