dhercher commented on pull request #15246: URL: https://github.com/apache/beam/pull/15246#issuecomment-893006636
My issue is that the reshuffle is having the inverse effect, since we force full parallelism at this stage its very easy to cause OOM crash loops when reading many files at once. At least in the Dataflow runner, it does not appear it knows how to properly scale down the number of threads when this sort of issue occurs to avoid the issue. Removing the reshuffle at least allows for someone to force their desired behavior when they know what they want rather than force a single aggressive strategy. I suppose the feature flag could control if the reshuffle occurs at all to maintain the current behavior by default and allow the user to manage the parallelism where needed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
