[ https://issues.apache.org/jira/browse/NIFI-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matt Burgess updated NIFI-2735: ------------------------------- Resolution: Won't Fix Status: Resolved (was: Patch Available) Resolving as QueryRecord and/or NIFI-5291 cover this > Add processor to perform simple aggregations > -------------------------------------------- > > Key: NIFI-2735 > URL: https://issues.apache.org/jira/browse/NIFI-2735 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions > Reporter: Matt Burgess > Priority: Major > > This is a proposal for a new processor (AggregateValues, for example) that > can perform simple aggregation operations such as count, sum, average, min, > max, and concatenate, over a set of "related" flow files. For example, when a > JSON file is split on an array (using the SplitJson processor), the total > count of the splits, the index of each split, and the unique identifier > (shared by each split) are stored as attributes in each flow file sent to the > "splits" relationship: > https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitJson/index.html > These attributes are the "fragment.*" attributes in the documentation for > SplitText, SplitXml, and SplitJson, for example. > Such a processor could perform these operations for each flow file split from > the original document, and when all documents from a split have been > processed, a flow file could be transferred to an "aggregate" relationship > containing attributes for the operation, aggregate value, etc. > An interesting application of this (besides the actual aggregation > operations) is that you can use the "aggregate" relationship as an event > trigger. For example if you need to wait until all files from a group are > processed, you can use AggregateValues and the "aggregate" relationship to > indicate downstream that the entire group has been processed. If there is not > a Split processor upstream, then the attributes (fragment.*) would have to be > manipulated by the data flow designer, but this can be accomplished with > other processors (including the scripting processors if necessary). -- This message was sent by Atlassian Jira (v8.20.10#820010)