mattyb149 commented on pull request #4482: URL: https://github.com/apache/nifi/pull/4482#issuecomment-681991302
You're right on both counts. Setting a very large reservoir can cause memory issues but I chose to do it in-memory for performance reasons, figuring that sampling is meant to choose a (potentially much) smaller set of records, and if the total number of sampled records is still very large, the flowfile could be split beforehand and then sampled. It changes the overall behavior of course but I figured most of the use cases are covered by this implementations. Also I didn't make the sampling strategies controller services (or even separate class files) mostly because I didn't foresee any other usage of them, but they can always be refactored into their own files and/or components if need be. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
