[
https://issues.apache.org/jira/browse/NIFI-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344254#comment-16344254
]
Matt Burgess commented on NIFI-2735:
------------------------------------
Probably not, the QueryRecord processor can do the simple aggregates pretty
well using SQL
> Add processor to perform simple aggregations
> --------------------------------------------
>
> Key: NIFI-2735
> URL: https://issues.apache.org/jira/browse/NIFI-2735
> Project: Apache NiFi
> Issue Type: New Feature
> Components: Extensions
> Reporter: Matt Burgess
> Assignee: Matt Burgess
> Priority: Major
>
> This is a proposal for a new processor (AggregateValues, for example) that
> can perform simple aggregation operations such as count, sum, average, min,
> max, and concatenate, over a set of "related" flow files. For example, when a
> JSON file is split on an array (using the SplitJson processor), the total
> count of the splits, the index of each split, and the unique identifier
> (shared by each split) are stored as attributes in each flow file sent to the
> "splits" relationship:
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitJson/index.html
> These attributes are the "fragment.*" attributes in the documentation for
> SplitText, SplitXml, and SplitJson, for example.
> Such a processor could perform these operations for each flow file split from
> the original document, and when all documents from a split have been
> processed, a flow file could be transferred to an "aggregate" relationship
> containing attributes for the operation, aggregate value, etc.
> An interesting application of this (besides the actual aggregation
> operations) is that you can use the "aggregate" relationship as an event
> trigger. For example if you need to wait until all files from a group are
> processed, you can use AggregateValues and the "aggregate" relationship to
> indicate downstream that the entire group has been processed. If there is not
> a Split processor upstream, then the attributes (fragment.*) would have to be
> manipulated by the data flow designer, but this can be accomplished with
> other processors (including the scripting processors if necessary).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)