Github user apiri commented on the pull request:

    https://github.com/apache/nifi/pull/213#issuecomment-195367460
  
    @mans2singh I think your logic for collecting the flowfiles should probably 
just continuously grab individual files at a time.  This continues until one of 
a couple scenarios is reached:  max buffer size is met or exceeded (based on 
the the size of each flowfile that has been collected), there are no more files 
coming into the processor (one or more flowfiles have been collected but we 
have not yet eclipsed the max buffer size).  In either case, we would close out 
that collected batch of files and send them on their way much as you had before.
    
    In terms of the 250MB by default, this is from the default batch size of 
250MB and, the worst case scenario, each file is 1MB in size. While each of 
these files is converted to a byte array for sending, they are continuously 
sitting on the heap.  In the event multiple instances of this processor are 
running, we could quickly consume some big chunks of the heap.  What I am 
proposing is to either get rid of the batch size (that property no longer 
exists) or make this a secondary consideration where we try to receive a 
certain batch size but first ensure we do not exceed the configured buffer and 
second, do not exceed the batch size, with similar semantics to the above 
scenarios in terms of when those batches are sent on their way plus, when a 
certain batch size has been reached.
    
    Does that clarify a bit? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to