Github user mattyb149 commented on the issue:
https://github.com/apache/nifi/pull/2448
That use case is what prompted the Jira for QueryDatabaseTable to support
incremental commits. It didn't have an "original" relationship so that wasn't
an issue, but the documentation explains that some attributes won't be
populated if "Output Batch Size" is specified.
Having a source processor (GetMongo) with an "original" relationship is
where it got a little weird for me, as the pattern I'm aware of (that is
getting increasingly popular) is to have source processors accept incoming flow
files for the purpose of configuration, not content per se. That's why I wasn't
sure where the "original" relationship belonged in the use case. Often it is
used as a downstream trigger that all documents were processed, but that
wouldn't apply if we were using incremental commits.
I'd prefer to avoid a new processor if we can make it work in a
user-friendly manner for GetMongo. My vote is for my option #1 above, with
sufficient documentation to describe the behavior. If the use case dictates the
need for the "original" flow file to be transferred, then they won't be able to
use incremental commits. On the other hand (as @bbende suggested to me), they
could always send the upstream flow to both GetMongo (with incremental commits)
and to another flow. At that point it'd be similar to my option #2 where a
different downstream flow would get the original flow file while it was also
being worked on by GetMongo. By keeping the child flow files connected to the
original as a parent (via provenance and the session), we still have the
lineage intact. Thoughts?
---