[ 
https://issues.apache.org/jira/browse/NIFI-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15866075#comment-15866075
 ] 

ASF GitHub Bot commented on NIFI-3356:
--------------------------------------

Github user olegz commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1493#discussion_r101079356
  
    --- Diff: nifi-docs/src/main/asciidoc/administration-guide.adoc ---
    @@ -2074,7 +2074,25 @@ The Provenance Repository contains the information 
related to Data Provenance. T
     
     |====
     |*Property*|*Description*
    -|nifi.provenance.repository.implementation|The Provenance Repository 
implementation. The default value is 
org.apache.nifi.provenance.PersistentProvenanceRepository and should only be 
changed with caution. To store provenance events in memory instead of on disk 
(at the risk of data loss in the event of power/machine failure), set this 
property to org.apache.nifi.provenance.VolatileProvenanceRepository.
    +|nifi.provenance.repository.implementation|The Provenance Repository 
implementation. The default value is 
org.apache.nifi.provenance.PersistentProvenanceRepository.
    +Two additional repositories are available as and should only be changed 
with caution.
    --- End diff --
    
    I am not sure '_should only be changed with caution_' is necessary. It 
sounds like the other two are broken or may break something. In reality they 
don't. They just behave different.


> Provide a newly refactored provenance repository
> ------------------------------------------------
>
>                 Key: NIFI-3356
>                 URL: https://issues.apache.org/jira/browse/NIFI-3356
>             Project: Apache NiFi
>          Issue Type: Task
>          Components: Core Framework
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>             Fix For: 1.2.0
>
>
> The Persistent Provenance Repository has been redesigned a few different 
> times over several years. The original design for the repository was to 
> provide storage of events and sequential iteration over those events via a 
> Reporting Task. After that, we added the ability to compress the data so that 
> it could be held longer. We then introduced the notion of indexing and 
> searching via Lucene. We've since made several more modifications to try to 
> boost performance.
> At this point, however, the repository is still the bottleneck for many flows 
> that handle large volumes of small FlowFiles. We need a new implementation 
> that is based around the current goals for the repository and that can 
> provide better throughput.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to