On Thu, Feb 22, 2024 at 9:37 AM Reuven Lax via dev <dev@beam.apache.org> wrote: > > On Thu, Feb 22, 2024 at 9:26 AM Kenneth Knowles <k...@apache.org> wrote: >> >> Wow I love your input Reuven. Of course "the source" that you are applying >> backpressure to is often a runner's shuffle so it may be state anyhow, but >> it is good to give the runner the choice of how to figure that out and maybe >> chain backpressure further. > > > Sort of - however most (streaming) runners apply backpressure through shuffle > as well. This means that while some amount of data will accumulate in > shuffle, eventually the backpressure will push back to the source. Caveat of > course is that this is mostly true for streaming runners, not batch runners.
For batch it's still preferable to keep the data upstream in shuffle (which has less size limitations) than state (which must reside in worker memory, though only one key at a time).