[
https://issues.apache.org/jira/browse/NIFI-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15866337#comment-15866337
]
ASF GitHub Bot commented on NIFI-3356:
--------------------------------------
Github user markap14 commented on a diff in the pull request:
https://github.com/apache/nifi/pull/1493#discussion_r101106345
--- Diff: nifi-docs/src/main/asciidoc/administration-guide.adoc ---
@@ -2074,7 +2074,25 @@ The Provenance Repository contains the information
related to Data Provenance. T
|====
|*Property*|*Description*
-|nifi.provenance.repository.implementation|The Provenance Repository
implementation. The default value is
org.apache.nifi.provenance.PersistentProvenanceRepository and should only be
changed with caution. To store provenance events in memory instead of on disk
(at the risk of data loss in the event of power/machine failure), set this
property to org.apache.nifi.provenance.VolatileProvenanceRepository.
+|nifi.provenance.repository.implementation|The Provenance Repository
implementation. The default value is
org.apache.nifi.provenance.PersistentProvenanceRepository.
+Two additional repositories are available as and should only be changed
with caution.
+To store provenance events in memory instead of on disk (at the risk of
data loss in the event of power/machine failure),
+set this property to
org.apache.nifi.provenance.VolatileProvenanceRepository. This leaves a
configurable number of Provenance Events in the Java heap, so the number
+of events that can be retained is very limited. It has been used
essentially as a no-op repository and is not recommended.
--- End diff --
I can agree with that.
> Provide a newly refactored provenance repository
> ------------------------------------------------
>
> Key: NIFI-3356
> URL: https://issues.apache.org/jira/browse/NIFI-3356
> Project: Apache NiFi
> Issue Type: Task
> Components: Core Framework
> Reporter: Mark Payne
> Assignee: Mark Payne
> Fix For: 1.2.0
>
>
> The Persistent Provenance Repository has been redesigned a few different
> times over several years. The original design for the repository was to
> provide storage of events and sequential iteration over those events via a
> Reporting Task. After that, we added the ability to compress the data so that
> it could be held longer. We then introduced the notion of indexing and
> searching via Lucene. We've since made several more modifications to try to
> boost performance.
> At this point, however, the repository is still the bottleneck for many flows
> that handle large volumes of small FlowFiles. We need a new implementation
> that is based around the current goals for the repository and that can
> provide better throughput.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)