[
https://issues.apache.org/jira/browse/NIFI-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16518309#comment-16518309
]
ASF GitHub Bot commented on NIFI-4838:
--------------------------------------
Github user mattyb149 commented on the issue:
https://github.com/apache/nifi/pull/2448
That use case is what prompted the Jira for QueryDatabaseTable to support
incremental commits. It didn't have an "original" relationship so that wasn't
an issue, but the documentation explains that some attributes won't be
populated if "Output Batch Size" is specified.
Having a source processor (GetMongo) with an "original" relationship is
where it got a little weird for me, as the pattern I'm aware of (that is
getting increasingly popular) is to have source processors accept incoming flow
files for the purpose of configuration, not content per se. That's why I wasn't
sure where the "original" relationship belonged in the use case. Often it is
used as a downstream trigger that all documents were processed, but that
wouldn't apply if we were using incremental commits.
I'd prefer to avoid a new processor if we can make it work in a
user-friendly manner for GetMongo. My vote is for my option #1 above, with
sufficient documentation to describe the behavior. If the use case dictates the
need for the "original" flow file to be transferred, then they won't be able to
use incremental commits. On the other hand (as @bbende suggested to me), they
could always send the upstream flow to both GetMongo (with incremental commits)
and to another flow. At that point it'd be similar to my option #2 where a
different downstream flow would get the original flow file while it was also
being worked on by GetMongo. By keeping the child flow files connected to the
original as a parent (via provenance and the session), we still have the
lineage intact. Thoughts?
> Make GetMongo support multiple commits and give some progress indication
> ------------------------------------------------------------------------
>
> Key: NIFI-4838
> URL: https://issues.apache.org/jira/browse/NIFI-4838
> Project: Apache NiFi
> Issue Type: Improvement
> Reporter: Mike Thomsen
> Assignee: Mike Thomsen
> Priority: Major
>
> It shouldn't wait until the end to do a commit() call because the effect is
> that GetMongo looks like it has hung to a user who is pulling a very large
> data set.
> It should also have an option for running a count query to get the current
> approximate count of documents that would match the query and append an
> attribute that indicates where a flowfile stands in the total result count.
> Ex:
> query.progress.point.start = 2500
> query.progress.point.end = 5000
> query.count.estimate = 17,568,231
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)