[
https://issues.apache.org/jira/browse/NIFI-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15468258#comment-15468258
]
ASF GitHub Bot commented on NIFI-2735:
--------------------------------------
Github user olegz commented on the issue:
https://github.com/apache/nifi/pull/988
Reviewing. . .
> Add processor to perform simple aggregations
> --------------------------------------------
>
> Key: NIFI-2735
> URL: https://issues.apache.org/jira/browse/NIFI-2735
> Project: Apache NiFi
> Issue Type: New Feature
> Components: Extensions
> Reporter: Matt Burgess
> Assignee: Matt Burgess
>
> This is a proposal for a new processor (AggregateValues, for example) that
> can perform simple aggregation operations such as count, sum, average, min,
> max, and concatenate, over a set of "related" flow files. For example, when a
> JSON file is split on an array (using the SplitJson processor), the total
> count of the splits, the index of each split, and the unique indentifier
> (shared by each split) are stored as attributes in each flow file sent to the
> "splits" relationship:
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitJson/index.html
> These attributes are the "fragment.*" attributes in the documentation for
> SplitText, SplitXml, and SplitJson, for example.
> Such a processor could perform these operations for each flow file split from
> the original document, and when all documents from a split have been
> processed, a flow file could be transferred to an "aggregate" relationship
> containing attributes for the operation, aggregate value, etc.
> An interesting application of this (besides the actual aggregation
> operations) is that you can use the "aggregate" relationship as an event
> trigger. For example if you need to wait until all files from a group are
> processed, you can use AggregateValues and the "aggregate" relationship to
> indicate downstream that the entire group has been processed. If there is not
> a Split processor upstream, then the attributes (fragment.*) would have to be
> manipulated by the data flow designer, but this can be accomplished with
> other processors (including the scripting processors if necessary).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)