mattyb149 commented on pull request #4482:
URL: https://github.com/apache/nifi/pull/4482#issuecomment-681991302


   You're right on both counts.  Setting a very large reservoir can cause 
memory issues but I chose to do it in-memory for performance reasons, figuring 
that sampling is meant to choose a (potentially much) smaller set of records, 
and if the total number of sampled records is still very large, the flowfile 
could be split beforehand and then sampled. It changes the overall behavior of 
course but I figured most of the use cases are covered by this implementations.
   Also I didn't make the sampling strategies controller services (or even 
separate class files) mostly because I didn't foresee any other usage of them, 
but they can always be refactored into their own files and/or components if 
need be.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to