Mark Payne created NIFI-7476:
--------------------------------
Summary: Allow users to configure FlowFile Concurrency on a
Process Group
Key: NIFI-7476
URL: https://issues.apache.org/jira/browse/NIFI-7476
Project: Apache NiFi
Issue Type: New Feature
Components: Core Framework, Core UI, Documentation & Website
Reporter: Mark Payne
Assignee: Mark Payne
The Wait/Notify processors are used quite heavily. These processors are very
powerful and allow for many different use cases. However, offering this power
is done at the expense of making the processors difficult to configure.
The most common use case, it seems, is to simply allow a Process Group to
process only a single FlowFile at a time. We see questions about how to
accomplish this fairly frequently in Slack and on the mailing list.
I propose that we add a new feature to NiFi so that when a user configures a
Process Group, they can configure the FlowFile Concurrency: either unbounded
(which is the current behavior) or a single FlowFile at a time on each node. In
the latter case, only a single FlowFile will be ingested by a Local Input Port,
and no more FlowFiles will be ingested as long as there is data queued in the
Process Group. Once all data has left the Process Group, the next FlowFile will
be allowed through.
This has several advantages over the Wait/Notify pair of Processors. Firstly,
there's no need to create a pair of two Processors and ensure that they are
used in concert together properly. Secondly, there aren't a lot of properties
to configure. Thirdly, implementing this at the framework level and with
limited features means the implementation can be much simpler than that of
Wait/Notify, which means it is much easier to maintain.
Additionally, a related concept can be easily introduced: the notion of a
FlowFile Outbound Policy. This is analogous to the FlowFile Concurrency but is
related to Output Ports. Here, the use could configure the group such that data
should be transferred out of the Process Group as soon as it's available (which
is the current behavior) or could be transferred as a batch. In the batch mode,
the Output Ports would not transfer any data out of the Process Group until all
FlowFiles are queued up at an Output Port (i.e., all processing has finished).
This allows for very simple configuration for an oft-requested capability: the
ability to perform some action only after processing of a batch of data has
completed.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)