Mark Payne created NIFI-7557:
--------------------------------

             Summary: Cache large/common FlowFile attributes when restoring 
FlowFile Repository
                 Key: NIFI-7557
                 URL: https://issues.apache.org/jira/browse/NIFI-7557
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Core Framework
            Reporter: Mark Payne
            Assignee: Mark Payne


When NiFi is restarted, it restores FlowFiles from the repository. Each 
attribute on a FlowFile is read from disk and put into a HashMap. There are 
times when a Processor will add a large attribute to every FlowFile that it 
sees, and this results in using much more heap upon NiFi restart to store 
FlowFiles than it does while NiFi is running. This is because the Processor 
holds the value of that FlowFile as a single String object and adds that String 
to the HashMap of attributes on every FlowFile.

However, on restart, NiFi deserializes a byte stream to come up with the 
attribute value. As a result, each FlowFile that has that attribute value ends 
up with its own String object, even though the same value is repeated many 
times.

As a result, a huge amount of heap may be used on restart, causing NiFi to 
encounter OOME when attempting to restore the FlowFile Repository.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to