Mark Payne created NIFI-7476:
--------------------------------

             Summary: Allow users to configure FlowFile Concurrency on a 
Process Group
                 Key: NIFI-7476
                 URL: https://issues.apache.org/jira/browse/NIFI-7476
             Project: Apache NiFi
          Issue Type: New Feature
          Components: Core Framework, Core UI, Documentation & Website
            Reporter: Mark Payne
            Assignee: Mark Payne


The Wait/Notify processors are used quite heavily. These processors are very 
powerful and allow for many different use cases. However, offering this power 
is done at the expense of making the processors difficult to configure.

The most common use case, it seems, is to simply allow a Process Group to 
process only a single FlowFile at a time. We see questions about how to 
accomplish this fairly frequently in Slack and on the mailing list.

I propose that we add a new feature to NiFi so that when a user configures a 
Process Group, they can configure the FlowFile Concurrency: either unbounded 
(which is the current behavior) or a single FlowFile at a time on each node. In 
the latter case, only a single FlowFile will be ingested by a Local Input Port, 
and no more FlowFiles will be ingested as long as there is data queued in the 
Process Group. Once all data has left the Process Group, the next FlowFile will 
be allowed through.

This has several advantages over the Wait/Notify pair of Processors. Firstly, 
there's no need to create a pair of two Processors and ensure that they are 
used in concert together properly. Secondly, there aren't a lot of properties 
to configure. Thirdly, implementing this at the framework level and with 
limited features means the implementation can be much simpler than that of 
Wait/Notify, which means it is much easier to maintain.

Additionally, a related concept can be easily introduced: the notion of a 
FlowFile Outbound Policy. This is analogous to the FlowFile Concurrency but is 
related to Output Ports. Here, the use could configure the group such that data 
should be transferred out of the Process Group as soon as it's available (which 
is the current behavior) or could be transferred as a batch. In the batch mode, 
the Output Ports would not transfer any data out of the Process Group until all 
FlowFiles are queued up at an Output Port (i.e., all processing has finished).

This allows for very simple configuration for an oft-requested capability: the 
ability to perform some action only after processing of a batch of data has 
completed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to