[
https://issues.apache.org/jira/browse/NIFI-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15467618#comment-15467618
]
ASF GitHub Bot commented on NIFI-2735:
--------------------------------------
GitHub user mattyb149 opened a pull request:
https://github.com/apache/nifi/pull/988
NIFI-2735: Add AggregateValues processor for aggregate operations on flow
file groups
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mattyb149/nifi aggregator
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/nifi/pull/988.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #988
----
commit 0730a98bc67f905dfd9edf479b2fcc7b2426386d
Author: Matt Burgess <[email protected]>
Date: 2016-09-06T14:51:28Z
NIFI-2735: Add AggregateValues processor for aggregate operations on flow
file groups
----
> Add processor to perform simple aggregations
> --------------------------------------------
>
> Key: NIFI-2735
> URL: https://issues.apache.org/jira/browse/NIFI-2735
> Project: Apache NiFi
> Issue Type: New Feature
> Components: Extensions
> Reporter: Matt Burgess
> Assignee: Matt Burgess
>
> This is a proposal for a new processor (AggregateValues, for example) that
> can perform simple aggregation operations such as count, sum, average, min,
> max, and concatenate, over a set of "related" flow files. For example, when a
> JSON file is split on an array (using the SplitJson processor), the total
> count of the splits, the index of each split, and the unique indentifier
> (shared by each split) are stored as attributes in each flow file sent to the
> "splits" relationship:
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitJson/index.html
> These attributes are the "fragment.*" attributes in the documentation for
> SplitText, SplitXml, and SplitJson, for example.
> Such a processor could perform these operations for each flow file split from
> the original document, and when all documents from a split have been
> processed, a flow file could be transferred to an "aggregate" relationship
> containing attributes for the operation, aggregate value, etc.
> An interesting application of this (besides the actual aggregation
> operations) is that you can use the "aggregate" relationship as an event
> trigger. For example if you need to wait until all files from a group are
> processed, you can use AggregateValues and the "aggregate" relationship to
> indicate downstream that the entire group has been processed. If there is not
> a Split processor upstream, then the attributes (fragment.*) would have to be
> manipulated by the data flow designer, but this can be accomplished with
> other processors (including the scripting processors if necessary).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)