damccorm commented on issue #33103: URL: https://github.com/apache/beam/issues/33103#issuecomment-2950050273
Probably the primary work that would need to happen here is we'd need a way of tracking partially computed pcollections and handling them here - https://github.com/apache/beam/blob/c37785b18d5e12d216ca026ed113b4be34bea34c/sdks/python/apache_beam/runners/interactive/recording_manager.py#L430 - so that we don't fully recompute the result. From there, there are 3 things we can do when we eventually call collect or another call is made with wait_for_inputs: 1) Just wait for the partially completed computations to finish (easiest, but also means waiting a potentially long time to start the next phase) 2) Wait only for the computations which are inputs to the current pcollection (harder, but allows us to potentially parallelize more (e.g. if writing to a sink, we won't block the rest of pipeline execution on that running) 3) Somehow dynamically update the executing pipeline (seems infeasible, but would be really cool) Ideally we'd do (2) IMO > blocking: If False, the computation will run in non-blocking fashion. In Colab/IPython environment this mode will also provide the controls for the running pipeline. If True, the computation will block until the pipeline is done. Does a blocking version just mean we do the same thing as `collect` more or less? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org