[
https://issues.apache.org/jira/browse/NIFI-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379256#comment-16379256
]
ASF GitHub Bot commented on NIFI-4872:
--------------------------------------
Github user markap14 commented on a diff in the pull request:
https://github.com/apache/nifi/pull/2475#discussion_r171064049
--- Diff:
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/MergeContent.java
---
@@ -131,6 +133,7 @@
@WritesAttribute(attribute = "merge.bin.age", description = "The age
of the bin, in milliseconds, when it was merged and output. Effectively "
+ "this is the greatest amount of time that any FlowFile in this
bundle remained waiting in this processor before it was output") })
@SeeAlso({SegmentContent.class, MergeRecord.class})
+@SystemResourceConsideration(resource = SystemResource.MEMORY)
--- End diff --
It would probably be helpful here to add a description that explains that
the content itself is not stored in memory but rather the FlowFiles' attributes
and that the configuration for max bin size, etc. will influence how much heap
is used. Would also call out that if merging together many small FlowFiles, a
two-stage approach may be necessary in order to avoid running out of memory.
> NIFI component high resource usage annotation
> ---------------------------------------------
>
> Key: NIFI-4872
> URL: https://issues.apache.org/jira/browse/NIFI-4872
> Project: Apache NiFi
> Issue Type: New Feature
> Components: Core Framework, Core UI
> Affects Versions: 1.5.0
> Reporter: Jeff Storck
> Assignee: Jeff Storck
> Priority: Critical
>
> NiFi Processors currently have no means to relay whether or not they have may
> be resource intensive or not. The idea here would be to introduce an
> Annotation that can be added to Processors that indicate they may cause high
> memory, disk, CPU, or network usage. For instance, any Processor that reads
> the FlowFile contents into memory (like many XML Processors for instance) may
> cause high memory usage. What ultimately determines if there is high
> memory/disk/cpu/network usage will depend on the FlowFiles being processed.
> With many of these components in the dataflow, it increases the risk of
> OutOfMemoryErrors and performance degradation.
> The annotation should support one value from a fixed list of: CPU, Disk,
> Memory, Network. It should also allow the developer to provide a custom
> description of the scenario that the component would fall under the high
> usage category. The annotation should be able to be specified multiple
> times, for as many resources as it has the potential to be high usage.
> By marking components with this new Annotation, we can update the generated
> Processor documentation to include this fact.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)