Matt Burgess created NIFI-2735:
----------------------------------
Summary: Add processor to perform simple aggregations
Key: NIFI-2735
URL: https://issues.apache.org/jira/browse/NIFI-2735
Project: Apache NiFi
Issue Type: New Feature
Components: Extensions
Reporter: Matt Burgess
Assignee: Matt Burgess
This is a proposal for a new processor (AggregateValues, for example) that can
perform simple aggregation operations such as count, sum, average, min, max,
and concatenate, over a set of "related" flow files. For example, when a JSON
file is split on an array (using the SplitJson processor), the total count of
the splits, the index of each split, and the unique indentifier (shared by each
split) are stored as attributes in each flow file sent to the "splits"
relationship:
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitJson/index.html
These attributes are the "fragment.*" attributes in the documentation for
SplitText, SplitXml, and SplitJson, for example.
Such a processor could perform these operations for each flow file split from
the original document, and when all documents from a split have been processed,
a flow file could be transferred to an "aggregate" relationship containing
attributes for the operation, aggregate value, etc.
An interesting application of this (besides the actual aggregation operations)
is that you can use the "aggregate" relationship as an event trigger. For
example if you need to wait until all files from a group are processed, you can
use AggregateValues and the "aggregate" relationship to indicate downstream
that the entire group has been processed. If there is not a Split processor
upstream, then the attributes (fragment.*) would have to be manipulated by the
data flow designer, but this can be accomplished with other processors
(including the scripting processors if necessary).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)