[ 
https://issues.apache.org/jira/browse/NIFI-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702405#comment-14702405
 ] 

Aldrin Piri commented on NIFI-756:
----------------------------------

[~markap14] Looks like this patch needs to be reformatted as well.  Tried 
reformatting paths in the patch, but received the following.

{quote}
± % git apply 
~/Downloads/0001-NIFI-756-Do-not-remove-documents-from-a-Lucene-Index.patch
error: patch failed: 
nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/PersistentProvenanceRepository.java:934
error: 
nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/PersistentProvenanceRepository.java:
 patch does not apply
{quote}

> Persistent Provenance Repository can avoid deleting events from lucene
> ----------------------------------------------------------------------
>
>                 Key: NIFI-756
>                 URL: https://issues.apache.org/jira/browse/NIFI-756
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>             Fix For: 0.3.0
>
>         Attachments: 
> 0001-NIFI-756-Do-not-remove-documents-from-a-Lucene-Index.patch
>
>
> Currently, when events expire in the repository, they are deleted from the 
> indices. This is very expensive. Since the index is sharded (by default at 
> 500 MB), we can instead just ensure that searches always have  a start date 
> no earlier than the first provenance event. This way, we won't retrieve any 
> expired records, but they can remain in the index. When all events in the 
> index have expired (we know, based on the earliest event of the next index), 
> we can simply close all readers/writers for the expired index and delete the 
> entire index. This is far cheaper than continually updating the Lucene 
> indices and would make a huge difference in performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to