damccorm commented on issue #33103:
URL: https://github.com/apache/beam/issues/33103#issuecomment-2950050273

   Probably the primary work that would need to happen here is we'd need a way 
of tracking partially computed pcollections and handling them here - 
https://github.com/apache/beam/blob/c37785b18d5e12d216ca026ed113b4be34bea34c/sdks/python/apache_beam/runners/interactive/recording_manager.py#L430
 - so that we don't fully recompute the result. From there, there are 3 things 
we can do when we eventually call collect or another call is made with 
wait_for_inputs:
   
   1) Just wait for the partially completed computations to finish (easiest, 
but also means waiting a potentially long time to start the next phase)
   2) Wait only for the computations which are inputs to the current 
pcollection (harder, but allows us to potentially parallelize more (e.g. if 
writing to a sink, we won't block the rest of pipeline execution on that 
running)
   3) Somehow dynamically update the executing pipeline (seems infeasible, but 
would be really cool)
   
   Ideally we'd do (2) IMO
   
   > blocking: If False, the computation will run in non-blocking fashion. In
   Colab/IPython environment this mode will also provide the controls for the
   running pipeline. If True, the computation will block until the pipeline
   is done.
   
   Does a blocking version just mean we do the same thing as `collect` more or 
less?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to