KevinGG commented on pull request #14839: URL: https://github.com/apache/beam/pull/14839#issuecomment-848303431
> > It is possible to use interactive runner to capture a pipeline and send as a complete beam job. Are those impacted? (/cc @KevinGG ) > > I'm not sure about that, does it officially support dataflow? The problem with the remote execution is that most of the global context is not transferred (such as imports, functions/classes called from transforms, global variables etc). With byref=False we 'accidentally' transfer the whole globally defined transform class, but not anything else from the global scope (so if the transform e.g. calls an import/function, it won't work remotely). So I think InteractiveRunner needs a way of transferring globals which is a) exclusive to it and b) more robust in terms of what get transferred. Maybe as a quick short-term option InteractiveRunner could set the dill byref option globally to preserve the current behavior. > > For all the other (non-interactive) pipelines byref=False only increases the pipeline proto size and makes local pipeline roundtrips buggy. I don't think the InteractiveRunner is specifically affected. For InteractiveRunner to support DataflowRunner as an underlying runner, for a customer to run a Dataflow job from a notebook, or for a customer to run a Python script to start a Dataflow job, the related problem we/the customers need to solve is the same to other normal dataflow usages: https://cloud.google.com/dataflow/docs/resources/faq#how_do_i_handle_nameerrors It says: >By default, global imports, functions, and variables defined in the main session are not saved during the serialization of a Dataflow job. If, for example, your DoFns are defined in the main file and reference imports and functions in the global namespace, you can set the --save_main_session pipeline option to True. This will cause the state of the global namespace to be pickled and loaded on the Dataflow worker. As long as the change makes sure that Beam still preserves the same behavior, I do not think there is any specific AI for InteractiveRunner. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
