This is an automated email from the ASF dual-hosted git repository.

tvalentyn pushed a commit to branch tvalentyn-patch-2
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 8fde9f46db9760ee6b5d220706485d595eccbaeb
Author: tvalentyn <[email protected]>
AuthorDate: Tue Nov 25 16:13:39 2025 -0800

    Update python-pipeline-dependencies.md
---
 .../en/documentation/sdks/python-pipeline-dependencies.md  | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git 
a/website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
 
b/website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
index fefc2d12513..b6ffac92f1d 100644
--- 
a/website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
+++ 
b/website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
@@ -163,22 +163,20 @@ Dataflow, see [Pre-building the python SDK custom 
container image with extra dep
 ## Pickling and Managing the Main Session
 
 When the Python SDK submits the pipeline for execution to a remote runner, the 
pipeline contents, such as transform user code, is serialized (or pickled) into 
a bytecode using
-libraries that perform the serialization (also called picklers).  The default 
pickler library used by Beam is `dill`.
-To use the `cloudpickle` pickler, supply the `--pickle_library=cloudpickle` 
pipeline option.
+libraries that perform the serialization (also called picklers).  On Apache 
Beam 2.64.0 or earlier, the default pickler library was `dill`.
 
-By default, global imports, functions, and variables defined in the main 
pipeline module are not saved during the serialization of a Beam job.
+When `dill` pickler is used, global imports, functions, and variables defined 
in the main pipeline module are not saved during the serialization of a Beam 
job by default.
 Thus, one might encounter an unexpected `NameError` when running a `DoFn` on 
any remote runner. To resolve this, supply the main session content with the 
pipeline by
 setting the `--save_main_session` pipeline option. This will load the pickled 
state of the global namespace onto the Dataflow workers (if using 
`DataflowRunner`).
 For example, see [Handling 
NameErrors](https://cloud.google.com/dataflow/docs/guides/common-errors#name-error)
 to set the main session on the `DataflowRunner`.
 
-Managing the main session in Python SDK is only necessary when using `dill` 
pickler on any remote runner. Therefore, this issue will
-not occur in `DirectRunner`.
-
 Since serialization of the pipeline happens on the job submission, and 
deserialization happens at runtime, it is imperative that the same version of 
pickling library is used at job submission and at runtime.
-To ensure this, Beam typically sets a very narrow supported version range for 
pickling libraries. If for whatever reason, users cannot use the version of 
`dill` or `cloudpickle` required by Beam, and choose to
-install a custom version, they must also ensure that they use the same custom 
version at runtime (e.g. in their custom container,
+To ensure this, Beam users who use `dill` and choose to install a custom 
version of dill, must also ensure that they use the same custom version at 
runtime (e.g. in their custom container,
 or by specifying a pipeline dependency requirement).
 
+The `--save_main_session` pipeline options is not necessary when `cloudpickle` 
pickler is used, which is the default pickler on Apache Beam 2.65.0 and later 
versions. 
+To use the `cloudpickle` pickler on the earlier Beam versions, supply the 
`--pickle_library=cloudpickle` pipeline option.
+
 ## Control the dependencies the pipeline uses {#control-dependencies}
 
 ### Pipeline environments

Reply via email to