[ 
https://issues.apache.org/jira/browse/NIFI-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828287#comment-15828287
 ] 

ASF GitHub Bot commented on NIFI-1135:
--------------------------------------

Github user markap14 commented on the issue:

    https://github.com/apache/nifi/pull/1413
  
    @mcgilman Looks great! +1 merged to master. Thanks for updating this!


> For Provenance Query, bring back Event Summaries instead of the Events 
> themselves
> ---------------------------------------------------------------------------------
>
>                 Key: NIFI-1135
>                 URL: https://issues.apache.org/jira/browse/NIFI-1135
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework, Core UI
>    Affects Versions: 1.0.0
>            Reporter: Mark Payne
>            Assignee: Matt Gilman
>             Fix For: 1.2.0
>
>
> Currently, when we query Provenance, we pull back up to 1000 events. These 
> are full Provenance Events with attributes, etc. If the query takes a long 
> time, we will request those objects that already have matched the query many 
> times. This amounts to a great deal of heap being used and sending back very 
> large JSON objects (10+ MB is not uncommon and it could potentially be far 
> worse).
> We should instead use a ProvenanceEventSummary object. This object should 
> contain just the info shown in the results table and the pointer to the 
> actual event in the Provenance Store. This allows us to return the queries 
> much faster, store less data in the heap, and provide less data back to the 
> end user with virtually the same experience.
> The one place that this would differ in UX is when the user clicks the "info" 
> button to view the entire provenance event, we would have to pull the event 
> back from the server, rather than already having that in memory.
> We should consider storing all of the fields in the results table in Lucene 
> to provide faster results. Otherwise, we could still get potentially better 
> results with the current approach if we just ensure that the first fields 
> that we store are those in the results table. This allows us to read just a 
> small portion of the event from file and deserializing just a small amount of 
> data before moving on to the next event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to