KevinGG commented on pull request #14839:
URL: https://github.com/apache/beam/pull/14839#issuecomment-848303431


   > > It is possible to use interactive runner to capture a pipeline and send 
as a complete beam job. Are those impacted? (/cc @KevinGG )
   >
   > I'm not sure about that, does it officially support dataflow? The problem 
with the remote execution is that most of the global context is not transferred 
(such as imports, functions/classes called from transforms, global variables 
etc). With byref=False we 'accidentally' transfer the whole globally defined 
transform class, but not anything else from the global scope (so if the 
transform e.g. calls an import/function, it won't work remotely). So I think 
InteractiveRunner needs a way of transferring globals which is a) exclusive to 
it and b) more robust in terms of what get transferred. Maybe as a quick 
short-term option InteractiveRunner could set the dill byref option globally to 
preserve the current behavior.
   > 
   > For all the other (non-interactive) pipelines byref=False only increases 
the pipeline proto size and makes local pipeline roundtrips buggy.
   
   I don't think the InteractiveRunner is specifically affected.
   For InteractiveRunner to support DataflowRunner as an underlying runner, for 
a customer to run a Dataflow job from a notebook, or for a customer to run a 
Python script to start a Dataflow job, the related problem we/the customers 
need to solve is the same to other normal dataflow usages: 
https://cloud.google.com/dataflow/docs/resources/faq#how_do_i_handle_nameerrors
   It says:
   >By default, global imports, functions, and variables defined in the main 
session are not saved during the serialization of a Dataflow job. If, for 
example, your DoFns are defined in the main file and reference imports and 
functions in the global namespace, you can set the --save_main_session pipeline 
option to True. This will cause the state of the global namespace to be pickled 
and loaded on the Dataflow worker.
   
   As long as the change makes sure that Beam still preserves the same 
behavior, I do not think there is any specific AI for InteractiveRunner.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to