[
https://issues.apache.org/jira/browse/NIFI-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16517285#comment-16517285
]
ASF GitHub Bot commented on NIFI-4838:
--------------------------------------
Github user mattyb149 commented on the issue:
https://github.com/apache/nifi/pull/2448
I can't find any examples on where an "original" relationship is available
when progressive commits are also available, for the reason you mention above.
Progressive commits are available in QueryDatabaseTable but it is a source
processor and doesn't accept incoming flow files. The "original" relationship
is in processors like SplitXYZ where the content is changed but the original
input might need to be preserved, and all flow files are committed once at the
end. [NIFI-2878](https://issues.apache.org/jira/browse/NIFI-2878) is an open
issue to allow "streaming splits", but my guess is that the original
relationship will not be available, or we'd want to have consensus that
"original" can have different behavior based on the "partial commits" idea.
I've considered a couple approaches for this:
1) Do not transfer the input flow file to "original" if doing incremental
commits
2) Transfer the input flow file to "original" on the first incremental
commit
My concern with the former is that it may still be necessary to access the
input flow file. My concern with the latter is that it kind of implies that an
operation has succeeded, when continued processing after the first commit can
fail. At that point the flow may assume all flow files have been processed
(some flows count on "original" be emitted at the end of successful processing
and is used as a trigger). I'm not sure which is more user-friendly, thoughts?
> Make GetMongo support multiple commits and give some progress indication
> ------------------------------------------------------------------------
>
> Key: NIFI-4838
> URL: https://issues.apache.org/jira/browse/NIFI-4838
> Project: Apache NiFi
> Issue Type: Improvement
> Reporter: Mike Thomsen
> Assignee: Mike Thomsen
> Priority: Major
>
> It shouldn't wait until the end to do a commit() call because the effect is
> that GetMongo looks like it has hung to a user who is pulling a very large
> data set.
> It should also have an option for running a count query to get the current
> approximate count of documents that would match the query and append an
> attribute that indicates where a flowfile stands in the total result count.
> Ex:
> query.progress.point.start = 2500
> query.progress.point.end = 5000
> query.count.estimate = 17,568,231
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)