Github user mattyb149 commented on the issue:

    https://github.com/apache/nifi/pull/2448
  
    That use case is what prompted the Jira for QueryDatabaseTable to support 
incremental commits. It didn't have an "original" relationship so that wasn't 
an issue, but the documentation explains that some attributes won't be 
populated if "Output Batch Size" is specified.
    
    Having a source processor (GetMongo) with an "original" relationship is 
where it got a little weird for me, as the pattern I'm aware of (that is 
getting increasingly popular) is to have source processors accept incoming flow 
files for the purpose of configuration, not content per se. That's why I wasn't 
sure where the "original" relationship belonged in the use case. Often it is 
used as a downstream trigger that all documents were processed, but that 
wouldn't apply if we were using incremental commits.
    
    I'd prefer to avoid a new processor if we can make it work in a 
user-friendly manner for GetMongo. My vote is for my option #1 above, with 
sufficient documentation to describe the behavior. If the use case dictates the 
need for the "original" flow file to be transferred, then they won't be able to 
use incremental commits. On the other hand (as @bbende suggested to me), they 
could always send the upstream flow to both GetMongo (with incremental commits) 
and to another flow. At that point it'd be similar to my option #2 where a 
different downstream flow would get the original flow file while it was also 
being worked on by GetMongo. By keeping the child flow files connected to the 
original as a parent (via provenance and the session), we still have the 
lineage intact. Thoughts?


---

Reply via email to