Github user apiri commented on the pull request: https://github.com/apache/nifi/pull/213#issuecomment-195367460 @mans2singh I think your logic for collecting the flowfiles should probably just continuously grab individual files at a time. This continues until one of a couple scenarios is reached: max buffer size is met or exceeded (based on the the size of each flowfile that has been collected), there are no more files coming into the processor (one or more flowfiles have been collected but we have not yet eclipsed the max buffer size). In either case, we would close out that collected batch of files and send them on their way much as you had before. In terms of the 250MB by default, this is from the default batch size of 250MB and, the worst case scenario, each file is 1MB in size. While each of these files is converted to a byte array for sending, they are continuously sitting on the heap. In the event multiple instances of this processor are running, we could quickly consume some big chunks of the heap. What I am proposing is to either get rid of the batch size (that property no longer exists) or make this a secondary consideration where we try to receive a certain batch size but first ensure we do not exceed the configured buffer and second, do not exceed the batch size, with similar semantics to the above scenarios in terms of when those batches are sent on their way plus, when a certain batch size has been reached. Does that clarify a bit?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---