[ 
https://issues.apache.org/jira/browse/NIFI-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16518065#comment-16518065
 ] 

ASF GitHub Bot commented on NIFI-4838:
--------------------------------------

Github user MikeThomsen commented on the issue:

    https://github.com/apache/nifi/pull/2448
  
    @mattyb149 here's the use case that lead to this for the sake of discussion:
    
    > Client has a few huge collections. Client wants to be able to fetch very 
large chunks of them at a time. Client is unhappy that they have to wait for 
the full query execution in order to see anything happen in the UI. Client's 
non-production environments make it take a few hours of silent processing to 
finally get anything to commit to the session and show up in the UI. Client's 
technical folks probably would accept log statements at each iteration (where 
it makes sense) to show "yeah, I'm doing something" from GetMongo.
    
    So how about this third way that I could get done pretty quickly for 1.8...
    
    1. Add RunMongoCollectionFetch as a no-input processor that works like the 
input sources referenced by you above. It includes full query control, 
progressive commits, progress attributes, etc.
    2. Remove progressive commits from GetMongo, keep the option to calculate 
progress attributes and either way put info logger statements (that can be 
turned off) alerting that a new flowfile (or X num of them in the case of 1:1 
result/flowfile config) was prepped.


> Make GetMongo support multiple commits and give some progress indication
> ------------------------------------------------------------------------
>
>                 Key: NIFI-4838
>                 URL: https://issues.apache.org/jira/browse/NIFI-4838
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Mike Thomsen
>            Assignee: Mike Thomsen
>            Priority: Major
>
> It shouldn't wait until the end to do a commit() call because the effect is 
> that GetMongo looks like it has hung to a user who is pulling a very large 
> data set.
> It should also have an option for running a count query to get the current 
> approximate count of documents that would match the query and append an 
> attribute that indicates where a flowfile stands in the total result count. 
> Ex:
> query.progress.point.start = 2500
> query.progress.point.end = 5000
> query.count.estimate = 17,568,231



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to