I don't believe the Dataflow worker code is very useful for dynamic work rebalancing. Good dynamic work rebalancing will need support/signals from each runner. I believe there is a way to build simple dynamic work rebalancing system that would work for all bounded splits by performing a limited amount of graph rewriting at pipeline submission time and then periodic splitting while running sources. You need support for a self loop within the runner to be able to get support for unbounded soruces.
[ Full content available at: https://github.com/apache/beam/pull/6181 ] This message was relayed via gitbox.apache.org for [email protected]
