tvalentyn commented on code in PR #26331:
URL: https://github.com/apache/beam/pull/26331#discussion_r1172518937


##########
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md:
##########
@@ -146,8 +146,12 @@ Dataflow, see [Pre-building the python SDK custom 
container image with extra dep
 
 Pickling in the Python SDK is set up to pickle the state of the global 
namespace. By default, global imports, functions, and variables defined in the 
main session are not saved during the serialization of a Beam job.
 Thus, one might encounter an unexpected `NameError` when running a `DoFn` on 
any remote runner using portability. To resolve this, manage the main session by
-simply setting the main session. This will load the pickled state of the 
global namespace onto the Dataflow workers.
+simply setting the main session, `--save_main_session`. This will load the 
pickled state of the global namespace onto the Dataflow workers.

Review Comment:
   ```suggestion
   setting the `--save_main_session` pipeline option. This will load the 
pickled state of the global namespace onto the Dataflow workers.
   ```



##########
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md:
##########
@@ -146,8 +146,12 @@ Dataflow, see [Pre-building the python SDK custom 
container image with extra dep
 
 Pickling in the Python SDK is set up to pickle the state of the global 
namespace. By default, global imports, functions, and variables defined in the 
main session are not saved during the serialization of a Beam job.
 Thus, one might encounter an unexpected `NameError` when running a `DoFn` on 
any remote runner using portability. To resolve this, manage the main session by
-simply setting the main session. This will load the pickled state of the 
global namespace onto the Dataflow workers.
+simply setting the main session, `--save_main_session`. This will load the 
pickled state of the global namespace onto the Dataflow workers.
 For example, see [Handling 
NameErrors](https://cloud.google.com/dataflow/docs/guides/common-errors#how-do-i-handle-nameerrors)
 to set the main session on the `DataflowRunner`.
 
+The dill pickler is the default pickler in the Python SDK.

Review Comment:
   ```suggestion
   ```



##########
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md:
##########
@@ -146,8 +146,12 @@ Dataflow, see [Pre-building the python SDK custom 
container image with extra dep
 
 Pickling in the Python SDK is set up to pickle the state of the global 
namespace. By default, global imports, functions, and variables defined in the 
main session are not saved during the serialization of a Beam job.
 Thus, one might encounter an unexpected `NameError` when running a `DoFn` on 
any remote runner using portability. To resolve this, manage the main session by
-simply setting the main session. This will load the pickled state of the 
global namespace onto the Dataflow workers.
+simply setting the main session, `--save_main_session`. This will load the 
pickled state of the global namespace onto the Dataflow workers.
 For example, see [Handling 
NameErrors](https://cloud.google.com/dataflow/docs/guides/common-errors#how-do-i-handle-nameerrors)
 to set the main session on the `DataflowRunner`.
 
+The dill pickler is the default pickler in the Python SDK.
+
 **NOTE**: This applies to the Python SDK executing with the dill pickler on 
any remote runner using portability. Therefore, this issue will

Review Comment:
   ```suggestion
   **NOTE**: This applies to the Python SDK executing with the dill pickler on 
any remote runner. Therefore, this issue will
   ```
   
   (removing dev jargon which may be confusing to users)



##########
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md:
##########
@@ -146,8 +146,12 @@ Dataflow, see [Pre-building the python SDK custom 
container image with extra dep
 
 Pickling in the Python SDK is set up to pickle the state of the global 
namespace. By default, global imports, functions, and variables defined in the 
main session are not saved during the serialization of a Beam job.
 Thus, one might encounter an unexpected `NameError` when running a `DoFn` on 
any remote runner using portability. To resolve this, manage the main session by

Review Comment:
   ```suggestion
   Thus, one might encounter an unexpected `NameError` when running a `DoFn` on 
any remote runner. To resolve this, supply the main session content with the 
pipeline by
   ```



##########
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md:
##########
@@ -146,8 +146,12 @@ Dataflow, see [Pre-building the python SDK custom 
container image with extra dep
 
 Pickling in the Python SDK is set up to pickle the state of the global 
namespace. By default, global imports, functions, and variables defined in the 
main session are not saved during the serialization of a Beam job.

Review Comment:
   ```suggestion
   When the Python SDK submits the pipeline for execution to a remote runner, 
the pipeline contents, such as transform user code, is serialized (or pickled)  
into a bytecode using libraries that perform the serialization (also called 
picklers).  The default pickler library used by Beam is `dill`. By default, 
global imports, functions, and variables defined in the main pipeline module 
are not saved during the serialization of a Beam job.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to