Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2475#discussion_r171065265
  
    --- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/SplitXml.java
 ---
    @@ -82,6 +84,7 @@
                     description = "The number of split FlowFiles generated 
from the parent FlowFile"),
             @WritesAttribute(attribute = "segment.original.filename ", 
description = "The filename of the parent FlowFile")
     })
    +@SystemResourceConsideration(resource = SystemResource.MEMORY)
    --- End diff --
    
    In this particular context, we are buffering the entirety of the FlowFile's 
content (as a Document object, which can take approximately 10 times as much 
heap as the size of the XML - i.e., a 1 MB XML document may take 10 MB of 
heap), in addition to all of the generated FlowFile objects. A two-stage 
approach may well be necessary for lots of splits, but even then if the XML is 
large you could potentially run out of heap space.


---

Reply via email to