[jira] [Commented] (NIFI-4838) Make GetMongo support multiple commits and give some progress indication

ASF GitHub Bot (JIRA) Wed, 20 Jun 2018 08:53:07 -0700


    [ 
https://issues.apache.org/jira/browse/NIFI-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16518309#comment-16518309
 ]


ASF GitHub Bot commented on NIFI-4838:
--------------------------------------

Github user mattyb149 commented on the issue:

    https://github.com/apache/nifi/pull/2448
  
    That use case is what prompted the Jira for QueryDatabaseTable to support 
incremental commits. It didn't have an "original" relationship so that wasn't 
an issue, but the documentation explains that some attributes won't be 
populated if "Output Batch Size" is specified.
    
    Having a source processor (GetMongo) with an "original" relationship is 
where it got a little weird for me, as the pattern I'm aware of (that is 
getting increasingly popular) is to have source processors accept incoming flow 
files for the purpose of configuration, not content per se. That's why I wasn't 
sure where the "original" relationship belonged in the use case. Often it is 
used as a downstream trigger that all documents were processed, but that 
wouldn't apply if we were using incremental commits.
    
    I'd prefer to avoid a new processor if we can make it work in a 
user-friendly manner for GetMongo. My vote is for my option #1 above, with 
sufficient documentation to describe the behavior. If the use case dictates the 
need for the "original" flow file to be transferred, then they won't be able to 
use incremental commits. On the other hand (as @bbende suggested to me), they 
could always send the upstream flow to both GetMongo (with incremental commits) 
and to another flow. At that point it'd be similar to my option #2 where a 
different downstream flow would get the original flow file while it was also 
being worked on by GetMongo. By keeping the child flow files connected to the 
original as a parent (via provenance and the session), we still have the 
lineage intact. Thoughts?


> Make GetMongo support multiple commits and give some progress indication
> ------------------------------------------------------------------------
>
>                 Key: NIFI-4838
>                 URL: https://issues.apache.org/jira/browse/NIFI-4838
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Mike Thomsen
>            Assignee: Mike Thomsen
>            Priority: Major
>
> It shouldn't wait until the end to do a commit() call because the effect is 
> that GetMongo looks like it has hung to a user who is pulling a very large 
> data set.
> It should also have an option for running a count query to get the current 
> approximate count of documents that would match the query and append an 
> attribute that indicates where a flowfile stands in the total result count. 
> Ex:
> query.progress.point.start = 2500
> query.progress.point.end = 5000
> query.count.estimate = 17,568,231



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-4838) Make GetMongo support multiple commits and give some progress indication

Reply via email to