svetakvsundhar commented on code in PR #26236:
URL: https://github.com/apache/beam/pull/26236#discussion_r1164468409


##########
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md:
##########
@@ -141,3 +141,12 @@ However, it may be possible to pre-build the SDK 
containers and perform the depe
 Dataflow, see [Pre-building the python SDK custom container image with extra 
dependencies](https://cloud.google.com/dataflow/docs/guides/using-custom-containers#prebuild).
 
 **NOTE**: This feature is available only for the `Dataflow Runner v2`.
+
+## Pickling and Managing Main Session
+
+Pickling in the Python SDK is set up to pickle the state of the global 
namespace. By default, global imports, functions, and variables defined in the 
main session are not saved during the serialization of a Dataflow job.
+Thus, one might encounter an unexpected `NameError` when running a `DoFn` on 
Dataflow Runner. To resolve this, manage the main session by
+simply setting `--save_main_session=True`. This will load the pickled state of 
the global namespace onto the Dataflow workers.
+For more information, see [Handling 
NameErrors](https://cloud.google.com/dataflow/docs/guides/common-errors#how-do-i-handle-nameerrors).
+
+**NOTE**: This strictly applies to the `Python SDK executing with the dill 
pickler on the Dataflow Runner`.

Review Comment:
   ah thanks for the catch! updating to make it clearer -- this doesn't apply 
on `DirectRunner`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to